kloppenborg.cahttps://www.kloppenborg.ca/2022-11-21T00:00:00-05:00Explaining A- and B-Basis Values2022-11-21T00:00:00-05:002022-11-21T00:00:00-05:00Stefan Kloppenborgtag:www.kloppenborg.ca,2022-11-21:/2022/11/explaining-basis-values/<p>I’ve been thinking about how to explain an A- or B-Basis value
to people without much statistical knowledge.
These are the names used in aircraft certification for the
lower tolerance bounds on material strength.
The definition used for transport category aircraft is given in
<a href="https://drs.faa.gov/browse/excelExternalWindow/B30DF8E84C71FFBD86256D7A00712EAD.0001">14 <span class="caps">CFR</span> 25.613</a>
(the …</p><p>I’ve been thinking about how to explain an A- or B-Basis value
to people without much statistical knowledge.
These are the names used in aircraft certification for the
lower tolerance bounds on material strength.
The definition used for transport category aircraft is given in
<a href="https://drs.faa.gov/browse/excelExternalWindow/B30DF8E84C71FFBD86256D7A00712EAD.0001">14 <span class="caps">CFR</span> 25.613</a>
(the same definition is used for other categories of aircraft too).
This definition is precise, but not easy to understand.</p>
<blockquote>
<p>…
(b) Material design values must be chosen to minimize the probability of
structural failures due to material variability. …
compliance must be shown by selecting material design values which
assure material strength with the following probability:</p>
<p>(1) Where applied loads are eventually distributed through a single
member within an assembly, the failure of which would result in loss
of structural integrity of the component, 99 percent probability
with 95 percent confidence.</p>
<p>(2) For redundant structure, in which the failure of individual elements
would result in applied loads being safely distributed to other load
carrying members, 90 percent probability with 95 percent confidence. …</p>
</blockquote>
<p>Another way of stating this definition is the lower 95% confidence bound
on the 1-st or 10-th
percentile of the population, respectively. But, describing it that way
doesn’t help to explain the concept to a person who’s not well versed in statistics.</p>
<h1>The Explanation</h1>
<p>There’s some random variation in all material properties. Some pieces of any
material will be a little bit different than other pieces of the same material.
To account for this variation in the material properties
when we design aircraft structure, we design it so that there’s at least a 90%
chance that redundant structure is stronger than it needs to be (or 99% of
non-redundant structure).</p>
<p>When we test a material property, we get a <em>sample</em>. This <em>sample</em> is not a
perfect representation of the material property. A good analogy is that a
sample is like a low-resolution photo: it gives us an idea of what we’re seeing,
but we don’t get all the detail. We can get a better idea of what we’re seeing by taking
a higher resolution photo: this is akin to testing more and getting a larger
sample size.</p>
<p>We choose a statistical distribution that fits the data, then find the tenth
(or first) percentile of that distribution. But since we only have a sample of
the material property (a “low-resolution photo”, in the analogy), we’re not sure
if the distribution that we chose is correct.
To account for that uncertainty, we try out many possible distributions for the
material property and
determine how likely each is to be true based on the sample (the data).
Distributions that look a lot like the data are highly likely; distributions
that look different than the data are less likely, but depending on how
“low-resolution” our data is, they <em>could</em> be correct.
For each of these possible distributions, we find the 10th percentile
(for B-Basis; it would be the 1st percentile for A-Basis).
Next, we weight each of those individual 10th percentiles based on the
likelihood that the corresponding distribution is true, and we find a lower bound
where 95% of those weighted 10th percentiles are above that lower bound.</p>
<p>Or in graphical form:</p>
<p><img alt="The explanatory graph" src="explaining-basis-values_files/figure-markdown/explanation-graph-1.png"></p>
<p>I hope that this explanation helps explain this complicated topic.
If you think you have a better explanation, please connect with me on
<a href="https://linkedin.com/in/stefankloppenborg/">LinkedIn</a> and message me there.</p>
<h1>Developing the Graph</h1>
<p>Let’s look at how I developed this graph. The graph was developed using the R
language. As for most R code, we start by loading the required packages.</p>
<div class="highlight"><pre><span></span><code><span class="nf">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span>
<span class="nf">library</span><span class="p">(</span><span class="n">cmstatr</span><span class="p">)</span>
<span class="nf">library</span><span class="p">(</span><span class="n">stats4</span><span class="p">)</span>
</code></pre></div>
<p>In this example, we’ll use some of the sample data that comes with the
<a href="https://www.cmstatr.net"><code>cmstatr</code></a> package. We’ll be using the
room-temperature warp-tension example data.</p>
<div class="highlight"><pre><span></span><code><span class="n">dat</span> <span class="o"><-</span> <span class="n">carbon.fabric.2</span> <span class="o">%>%</span>
<span class="nf">filter</span><span class="p">(</span><span class="n">test</span> <span class="o">==</span> <span class="s">"WT"</span> <span class="o">&</span> <span class="n">condition</span> <span class="o">==</span> <span class="s">"RTD"</span><span class="p">)</span>
</code></pre></div>
<p>Let’s start by plotting this data. We’re plotting 1-D data, so we only need
one axis. But in order to make sure that none of the data points overlap,
we’ll add some “jitter” (random vertical position).
We’ll also hide the vertical axis since this axis is not meaningful.</p>
<div class="highlight"><pre><span></span><code><span class="n">dat</span> <span class="o">%>%</span>
<span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">strength</span><span class="p">))</span> <span class="o">+</span>
<span class="nf">geom_jitter</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="m">0.01</span><span class="p">),</span> <span class="n">height</span> <span class="o">=</span> <span class="m">0.005</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">scale_y_continuous</span><span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">,</span>
<span class="n">breaks</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">)</span>
</code></pre></div>
<p><img alt="unnamed-chunk-3-1" src="https://www.kloppenborg.ca/2022/11/explaining-basis-values/explaining-basis-values_files/figure-markdown/unnamed-chunk-3-1.png"></p>
<p>We can fit a normal distribution to this data. The sample mean and standard
deviation are point-estimates of the mean and standard deviation of the
distribution. We’ll use those point-estimates and draw the <span class="caps">PDF</span> superimposed
over the data, assuming that the distribution is normal.
We can also add the 10-th percentile of this distribution to the plot.</p>
<div class="highlight"><pre><span></span><code><span class="n">dat</span> <span class="o">%>%</span>
<span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">strength</span><span class="p">))</span> <span class="o">+</span>
<span class="nf">geom_jitter</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="m">0.01</span><span class="p">),</span> <span class="n">height</span> <span class="o">=</span> <span class="m">0.005</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">"magenta"</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">stat_function</span><span class="p">(</span><span class="n">fun</span> <span class="o">=</span> <span class="nf">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="nf">dnorm</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="nf">sd</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">)))</span> <span class="o">+</span>
<span class="nf">geom_vline</span><span class="p">(</span><span class="n">xintercept</span> <span class="o">=</span> <span class="nf">qnorm</span><span class="p">(</span><span class="m">0.1</span><span class="p">,</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span> <span class="nf">sd</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">)),</span>
<span class="n">color</span> <span class="o">=</span> <span class="s">"blue"</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">scale_y_continuous</span><span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">,</span>
<span class="n">breaks</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">)</span>
</code></pre></div>
<p><img alt="unnamed-chunk-4-1" src="https://www.kloppenborg.ca/2022/11/explaining-basis-values/explaining-basis-values_files/figure-markdown/unnamed-chunk-4-1.png"></p>
<p>But, the distribution that we’ve drawn is just a point-estimate from the data.
There is uncertainty in our estimate. Based on the data, we’ve concluded that
this estimate is the most likely, but we shouldn’t be surprised if the true
population distribution is a bit different. This point-estimate is actually
the Maximum Likelihood Estimate (<span class="caps">MLE</span>), based on this particular data.
We can calculate the likelihood of various potential estimates of distribution
(or rather, the parameters of the distribution) using the following equation:</p>
<p>$$
L\left(\mu, \sigma\right) = \prod_{i=1}^{n} f\left(X_i;\,\mu, \sigma\right) $$</p>
<p>Here $X_i$ are the data and there are two parameters for a normal distribution
are $\mu$ and $\sigma$. The function $f()$ is the probability density function.</p>
<p>It turns out that computers have trouble multiplying a bunch of small numbers
together and coming up with an accurate result. We can avoid this problem by
using a log transform:</p>
<p>$$
\mathcal{L}\left(\mu, \sigma\right) = \sum_{i=1}^{n} \log f\left(X_i;\,\mu, \sigma\right) $$</p>
<p>Implementing this in R:</p>
<div class="highlight"><pre><span></span><code><span class="n">log_likelihood</span> <span class="o"><-</span> <span class="nf">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">sum</span><span class="p">(</span><span class="nf">dnorm</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">,</span> <span class="n">log</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">))</span>
<span class="p">}</span>
</code></pre></div>
<p>To make sure that this function works, we can find the log likelihood of our <span class="caps">MLE</span>
of the parameters. The actual numerical value of the likelihood doesn’t mean
very much to us, but we’ll be interested in the <em>distribution</em> of the
likelihoods as we change the parameters.</p>
<div class="highlight"><pre><span></span><code><span class="nf">log_likelihood</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">,</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span> <span class="nf">sd</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">))</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## [1] -92.55627
</code></pre></div>
<p>It’s going to make our life a lot easier if we can work with a single parameter
instead of two ($\mu$ and $\sigma$). We’ll treat $\sigma$ as a nuisance
parameter and find the value of $\sigma$ that produces the greatest
likelihood for any given values of $\mu$. To avoid working with very tiny
numbers, we’ll calculate the relative likelihood (the likelihood divided by
the maximum likelihood). We can do this in R as follows:</p>
<div class="highlight"><pre><span></span><code><span class="n">rel_likelihood_mu</span> <span class="o"><-</span> <span class="nf">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="p">)</span> <span class="p">{</span>
<span class="n">ll_hat</span> <span class="o"><-</span> <span class="nf">log_likelihood</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="nf">mean</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="nf">sd</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
<span class="n">opt</span> <span class="o"><-</span> <span class="nf">optimize</span><span class="p">(</span>
<span class="nf">function</span><span class="p">(</span><span class="n">sigma</span><span class="p">)</span> <span class="nf">exp</span><span class="p">(</span><span class="nf">log_likelihood</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sigma</span><span class="p">)</span> <span class="o">-</span> <span class="n">ll_hat</span><span class="p">),</span>
<span class="n">lower</span> <span class="o">=</span> <span class="m">0</span><span class="p">,</span>
<span class="n">upper</span> <span class="o">=</span> <span class="m">20</span> <span class="o">*</span> <span class="nf">sd</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="c1"># pick an upper bound that's big</span>
<span class="n">maximum</span> <span class="o">=</span> <span class="kc">TRUE</span>
<span class="p">)</span>
<span class="c1"># We'll return a list of the sigma and the relative likelihood:</span>
<span class="nf">list</span><span class="p">(</span>
<span class="n">sigma</span> <span class="o">=</span> <span class="n">opt</span><span class="o">$</span><span class="n">maximum</span><span class="p">,</span>
<span class="n">rel_likelihood</span> <span class="o">=</span> <span class="n">opt</span><span class="o">$</span><span class="n">objective</span>
<span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>We can also do the same thing to calculate the relative likelihood of a
a particular 10th percentile ($x_p$). We use the transformation
$\mu = x_p - \sigma \Phi(0.1)$.</p>
<div class="highlight"><pre><span></span><code><span class="n">rel_likelihood_xp</span> <span class="o"><-</span> <span class="nf">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">xp</span><span class="p">)</span> <span class="p">{</span>
<span class="n">ll_hat</span> <span class="o"><-</span> <span class="nf">log_likelihood</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="nf">mean</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="nf">sd</span><span class="p">(</span><span class="n">x</span><span class="p">))</span>
<span class="n">opt</span> <span class="o"><-</span> <span class="nf">optimize</span><span class="p">(</span>
<span class="nf">function</span><span class="p">(</span><span class="n">sigma</span><span class="p">)</span> <span class="nf">exp</span><span class="p">(</span><span class="nf">log_likelihood</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">xp</span> <span class="o">-</span> <span class="n">sigma</span> <span class="o">*</span> <span class="nf">qnorm</span><span class="p">(</span><span class="m">0.1</span><span class="p">),</span> <span class="n">sigma</span><span class="p">)</span> <span class="o">-</span> <span class="n">ll_hat</span><span class="p">),</span>
<span class="n">lower</span> <span class="o">=</span> <span class="m">0</span><span class="p">,</span>
<span class="n">upper</span> <span class="o">=</span> <span class="m">20</span> <span class="o">*</span> <span class="nf">sd</span><span class="p">(</span><span class="n">x</span><span class="p">),</span> <span class="c1"># pick an upper bound that's big</span>
<span class="n">maximum</span> <span class="o">=</span> <span class="kc">TRUE</span>
<span class="p">)</span>
<span class="c1"># We'll return a list of the sigma and the relative likelihood:</span>
<span class="nf">list</span><span class="p">(</span>
<span class="n">sigma</span> <span class="o">=</span> <span class="n">opt</span><span class="o">$</span><span class="n">maximum</span><span class="p">,</span>
<span class="n">rel_likelihood</span> <span class="o">=</span> <span class="n">opt</span><span class="o">$</span><span class="n">objective</span>
<span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>Now, we can draw our same plot again, but this time, we’ll draw a bunch of
potential distributions (and 10-th percentiles), coloring them according
to their likelihood:</p>
<div class="highlight"><pre><span></span><code><span class="n">p</span> <span class="o"><-</span> <span class="n">dat</span> <span class="o">%>%</span>
<span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">strength</span><span class="p">))</span>
<span class="nf">walk</span><span class="p">(</span>
<span class="nf">seq</span><span class="p">(</span><span class="n">from</span> <span class="o">=</span> <span class="m">0.95</span> <span class="o">*</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="n">to</span> <span class="o">=</span> <span class="m">1.05</span> <span class="o">*</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="n">length.out</span> <span class="o">=</span> <span class="m">55</span><span class="p">),</span>
<span class="nf">function</span><span class="p">(</span><span class="n">mu</span><span class="p">)</span> <span class="p">{</span>
<span class="n">rl</span> <span class="o"><-</span> <span class="nf">rel_likelihood_mu</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">,</span> <span class="n">mu</span><span class="p">)</span>
<span class="n">p</span> <span class="o"><<-</span> <span class="n">p</span> <span class="o">+</span> <span class="nf">stat_function</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">alpha</span> <span class="o">=</span> <span class="n">rl</span><span class="o">$</span><span class="n">rel_likelihood</span><span class="p">),</span>
<span class="n">fun</span> <span class="o">=</span> <span class="nf">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="nf">dnorm</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">rl</span><span class="o">$</span><span class="n">sigma</span><span class="p">))</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="n">xp_dist</span> <span class="o"><-</span> <span class="nf">imap_dfr</span><span class="p">(</span>
<span class="nf">seq</span><span class="p">(</span><span class="n">from</span> <span class="o">=</span> <span class="m">0.9</span> <span class="o">*</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="n">to</span> <span class="o">=</span> <span class="m">0.98</span> <span class="o">*</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="n">length.out</span> <span class="o">=</span> <span class="m">55</span><span class="p">),</span>
<span class="nf">function</span><span class="p">(</span><span class="n">xp</span><span class="p">,</span> <span class="n">ii</span><span class="p">)</span> <span class="p">{</span>
<span class="n">rl</span> <span class="o"><-</span> <span class="nf">rel_likelihood_xp</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">,</span> <span class="n">xp</span><span class="p">)</span>
<span class="nf">data.frame</span><span class="p">(</span><span class="n">xp</span> <span class="o">=</span> <span class="n">xp</span><span class="p">,</span> <span class="n">rl</span> <span class="o">=</span> <span class="n">rl</span><span class="o">$</span><span class="n">rel_likelihood</span><span class="p">)</span>
<span class="p">})</span>
<span class="n">p</span> <span class="o">+</span>
<span class="nf">geom_vline</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">xintercept</span> <span class="o">=</span> <span class="n">xp</span><span class="p">,</span> <span class="n">alpha</span> <span class="o">=</span> <span class="n">rl</span><span class="p">),</span> <span class="n">data</span> <span class="o">=</span> <span class="n">xp_dist</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">"blue"</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">geom_jitter</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="m">0.01</span><span class="p">),</span> <span class="n">height</span> <span class="o">=</span> <span class="m">0.005</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">"magenta"</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">scale_y_continuous</span><span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">,</span>
<span class="n">breaks</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">)</span>
</code></pre></div>
<p><img alt="unnamed-chunk-9-1" src="https://www.kloppenborg.ca/2022/11/explaining-basis-values/explaining-basis-values_files/figure-markdown/unnamed-chunk-9-1.png"></p>
<p>Next, we can add a plot of the distribution of the 10th percentiles.
We’ll also plot the B-Basis, as calculated by the package
<a href="https://www.cmstatr.net"><code>cmstatr</code></a>.</p>
<div class="highlight"><pre><span></span><code><span class="n">p</span> <span class="o"><-</span> <span class="n">dat</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">f</span> <span class="o">=</span> <span class="m">1</span><span class="p">)</span> <span class="o">%>%</span>
<span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">strength</span><span class="p">))</span>
<span class="nf">walk</span><span class="p">(</span>
<span class="nf">seq</span><span class="p">(</span><span class="n">from</span> <span class="o">=</span> <span class="m">0.95</span> <span class="o">*</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="n">to</span> <span class="o">=</span> <span class="m">1.05</span> <span class="o">*</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="n">length.out</span> <span class="o">=</span> <span class="m">55</span><span class="p">),</span>
<span class="nf">function</span><span class="p">(</span><span class="n">mu</span><span class="p">)</span> <span class="p">{</span>
<span class="n">rl</span> <span class="o"><-</span> <span class="nf">rel_likelihood_mu</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">,</span> <span class="n">mu</span><span class="p">)</span>
<span class="n">p</span> <span class="o"><<-</span> <span class="n">p</span> <span class="o">+</span> <span class="nf">stat_function</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">alpha</span> <span class="o">=</span> <span class="n">rl</span><span class="o">$</span><span class="n">rel_likelihood</span><span class="p">),</span>
<span class="n">fun</span> <span class="o">=</span> <span class="nf">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="nf">dnorm</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">rl</span><span class="o">$</span><span class="n">sigma</span><span class="p">))</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="n">xp_dist</span> <span class="o"><-</span> <span class="nf">imap_dfr</span><span class="p">(</span>
<span class="nf">seq</span><span class="p">(</span><span class="n">from</span> <span class="o">=</span> <span class="m">0.9</span> <span class="o">*</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="n">to</span> <span class="o">=</span> <span class="m">0.98</span> <span class="o">*</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="n">length.out</span> <span class="o">=</span> <span class="m">55</span><span class="p">),</span>
<span class="nf">function</span><span class="p">(</span><span class="n">xp</span><span class="p">,</span> <span class="n">ii</span><span class="p">)</span> <span class="p">{</span>
<span class="n">rl</span> <span class="o"><-</span> <span class="nf">rel_likelihood_xp</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">,</span> <span class="n">xp</span><span class="p">)</span>
<span class="nf">data.frame</span><span class="p">(</span><span class="n">xp</span> <span class="o">=</span> <span class="n">xp</span><span class="p">,</span> <span class="n">rl</span> <span class="o">=</span> <span class="n">rl</span><span class="o">$</span><span class="n">rel_likelihood</span><span class="p">)</span>
<span class="p">})</span>
<span class="n">p</span> <span class="o">+</span>
<span class="nf">geom_vline</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">xintercept</span> <span class="o">=</span> <span class="n">xp</span><span class="p">,</span> <span class="n">alpha</span> <span class="o">=</span> <span class="n">rl</span><span class="p">),</span> <span class="n">data</span> <span class="o">=</span> <span class="n">xp_dist</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">"blue"</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">geom_vline</span><span class="p">(</span>
<span class="nf">aes</span><span class="p">(</span><span class="n">xintercept</span> <span class="o">=</span> <span class="nf">basis_normal</span><span class="p">(</span><span class="n">dat</span><span class="p">,</span> <span class="n">strength</span><span class="p">)</span><span class="o">$</span><span class="n">basis</span><span class="p">),</span>
<span class="n">color</span> <span class="o">=</span> <span class="s">"red"</span><span class="p">,</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">xp_dist</span> <span class="o">%>%</span> <span class="nf">mutate</span><span class="p">(</span><span class="n">f</span> <span class="o">=</span> <span class="m">2</span><span class="p">),</span>
<span class="n">inherit.aes</span> <span class="o">=</span> <span class="kc">FALSE</span>
<span class="p">)</span> <span class="o">+</span>
<span class="nf">geom_line</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">xp</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">rl</span><span class="p">),</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">xp_dist</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">f</span> <span class="o">=</span> <span class="m">2</span><span class="p">),</span>
<span class="n">inherit.aes</span> <span class="o">=</span> <span class="kc">FALSE</span><span class="p">,</span>
<span class="n">color</span> <span class="o">=</span> <span class="s">"black"</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">geom_jitter</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="m">0.01</span><span class="p">),</span> <span class="n">height</span> <span class="o">=</span> <span class="m">0.005</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">"magenta"</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">facet_grid</span><span class="p">(</span><span class="n">f</span> <span class="o">~</span> <span class="n">.</span><span class="p">,</span>
<span class="n">scales</span> <span class="o">=</span> <span class="s">"free_y"</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">scale_y_continuous</span><span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">,</span>
<span class="n">breaks</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">theme</span><span class="p">(</span><span class="n">strip.text</span> <span class="o">=</span> <span class="nf">element_blank</span><span class="p">())</span>
</code></pre></div>
<p><img alt="unnamed-chunk-10-1" src="https://www.kloppenborg.ca/2022/11/explaining-basis-values/explaining-basis-values_files/figure-markdown/unnamed-chunk-10-1.png"></p>
<p>The astute reader might recognize the lower curve as a non-central
t-distribution. Since it’s a relative likelihood and not a probability,
the vertical scale (which is hidden) won’t match the non-central t-distribution,
but it’s the same shape. Just for fun, we can plot the lower curve shown above
and a non-central t-distribution:</p>
<div class="highlight"><pre><span></span><code><span class="nf">bind_rows</span><span class="p">(</span>
<span class="n">xp_dist</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">f</span> <span class="o">=</span> <span class="s">"Relative Likelihood"</span><span class="p">),</span>
<span class="n">xp_dist</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">n</span> <span class="o">=</span> <span class="nf">length</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="n">rl</span> <span class="o">=</span> <span class="nf">dt</span><span class="p">(</span><span class="nf">sqrt</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">)</span> <span class="o">-</span> <span class="n">xp</span><span class="p">)</span> <span class="o">/</span> <span class="nf">sd</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span> <span class="n">df</span> <span class="o">=</span> <span class="n">n</span> <span class="o">-</span> <span class="m">1</span><span class="p">,</span> <span class="n">ncp</span> <span class="o">=</span> <span class="nf">qnorm</span><span class="p">(</span><span class="m">0.9</span><span class="p">)</span> <span class="o">*</span> <span class="nf">sqrt</span><span class="p">(</span><span class="n">n</span><span class="p">)),</span>
<span class="n">f</span> <span class="o">=</span> <span class="s">"t-Distribution"</span><span class="p">)</span> <span class="o">%>%</span>
<span class="nf">select</span><span class="p">(</span><span class="o">-</span><span class="nf">c</span><span class="p">(</span><span class="n">n</span><span class="p">))</span>
<span class="p">)</span> <span class="o">%>%</span>
<span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">xp</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">rl</span><span class="p">))</span> <span class="o">+</span>
<span class="nf">geom_line</span><span class="p">()</span> <span class="o">+</span>
<span class="nf">facet_grid</span><span class="p">(</span><span class="n">f</span> <span class="o">~</span> <span class="n">.</span><span class="p">,</span> <span class="n">scales</span> <span class="o">=</span> <span class="s">"free_y"</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">ylab</span><span class="p">(</span><span class="s">""</span><span class="p">)</span>
</code></pre></div>
<p><img alt="unnamed-chunk-11-1" src="https://www.kloppenborg.ca/2022/11/explaining-basis-values/explaining-basis-values_files/figure-markdown/unnamed-chunk-11-1.png"></p>
<p>Turning our attention back to creating the graph for this blog post,
we’ll improve the aesthetics of the graph and also add the annotations:</p>
<div class="highlight"><pre><span></span><code><span class="n">p</span> <span class="o"><-</span> <span class="n">dat</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">f</span> <span class="o">=</span> <span class="m">1</span><span class="p">)</span> <span class="o">%>%</span>
<span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">strength</span><span class="p">))</span>
<span class="nf">walk</span><span class="p">(</span>
<span class="nf">seq</span><span class="p">(</span><span class="n">from</span> <span class="o">=</span> <span class="m">0.95</span> <span class="o">*</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="n">to</span> <span class="o">=</span> <span class="m">1.05</span> <span class="o">*</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="n">length.out</span> <span class="o">=</span> <span class="m">55</span><span class="p">),</span>
<span class="nf">function</span><span class="p">(</span><span class="n">mu</span><span class="p">)</span> <span class="p">{</span>
<span class="n">rl</span> <span class="o"><-</span> <span class="nf">rel_likelihood_mu</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">,</span> <span class="n">mu</span><span class="p">)</span>
<span class="n">p</span> <span class="o"><<-</span> <span class="n">p</span> <span class="o">+</span> <span class="nf">stat_function</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">alpha</span> <span class="o">=</span> <span class="n">rl</span><span class="o">$</span><span class="n">rel_likelihood</span><span class="p">),</span>
<span class="n">fun</span> <span class="o">=</span> <span class="nf">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="nf">dnorm</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mu</span><span class="p">,</span> <span class="n">rl</span><span class="o">$</span><span class="n">sigma</span><span class="p">))</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="n">xp_dist</span> <span class="o"><-</span> <span class="nf">imap_dfr</span><span class="p">(</span>
<span class="nf">seq</span><span class="p">(</span><span class="n">from</span> <span class="o">=</span> <span class="m">0.9</span> <span class="o">*</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="n">to</span> <span class="o">=</span> <span class="m">0.98</span> <span class="o">*</span> <span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">),</span>
<span class="n">length.out</span> <span class="o">=</span> <span class="m">55</span><span class="p">),</span>
<span class="nf">function</span><span class="p">(</span><span class="n">xp</span><span class="p">,</span> <span class="n">ii</span><span class="p">)</span> <span class="p">{</span>
<span class="n">rl</span> <span class="o"><-</span> <span class="nf">rel_likelihood_xp</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength</span><span class="p">,</span> <span class="n">xp</span><span class="p">)</span>
<span class="nf">data.frame</span><span class="p">(</span><span class="n">xp</span> <span class="o">=</span> <span class="n">xp</span><span class="p">,</span> <span class="n">rl</span> <span class="o">=</span> <span class="n">rl</span><span class="o">$</span><span class="n">rel_likelihood</span><span class="p">)</span>
<span class="p">})</span>
<span class="n">p</span> <span class="o">+</span>
<span class="nf">geom_vline</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">xintercept</span> <span class="o">=</span> <span class="n">xp</span><span class="p">,</span> <span class="n">alpha</span> <span class="o">=</span> <span class="n">rl</span><span class="p">),</span> <span class="n">data</span> <span class="o">=</span> <span class="n">xp_dist</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">"blue"</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">geom_vline</span><span class="p">(</span>
<span class="nf">aes</span><span class="p">(</span><span class="n">xintercept</span> <span class="o">=</span> <span class="nf">basis_normal</span><span class="p">(</span><span class="n">dat</span><span class="p">,</span> <span class="n">strength</span><span class="p">)</span><span class="o">$</span><span class="n">basis</span><span class="p">),</span>
<span class="n">color</span> <span class="o">=</span> <span class="s">"red"</span><span class="p">,</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">xp_dist</span> <span class="o">%>%</span> <span class="nf">mutate</span><span class="p">(</span><span class="n">f</span> <span class="o">=</span> <span class="m">2</span><span class="p">),</span>
<span class="n">inherit.aes</span> <span class="o">=</span> <span class="kc">FALSE</span>
<span class="p">)</span> <span class="o">+</span>
<span class="nf">geom_line</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">xp</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">rl</span><span class="p">),</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">xp_dist</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">f</span> <span class="o">=</span> <span class="m">2</span><span class="p">),</span>
<span class="n">inherit.aes</span> <span class="o">=</span> <span class="kc">FALSE</span><span class="p">,</span>
<span class="n">color</span> <span class="o">=</span> <span class="s">"black"</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">geom_jitter</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">y</span> <span class="o">=</span> <span class="m">0.01</span><span class="p">),</span> <span class="n">height</span> <span class="o">=</span> <span class="m">0.005</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="s">"magenta"</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">facet_grid</span><span class="p">(</span><span class="n">f</span> <span class="o">~</span> <span class="n">.</span><span class="p">,</span>
<span class="n">scales</span> <span class="o">=</span> <span class="s">"free_y"</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">theme_bw</span><span class="p">()</span> <span class="o">+</span>
<span class="nf">guides</span><span class="p">(</span><span class="n">alpha</span> <span class="o">=</span> <span class="nf">guide_none</span><span class="p">())</span> <span class="o">+</span>
<span class="nf">scale_y_continuous</span><span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">,</span>
<span class="n">breaks</span> <span class="o">=</span> <span class="kc">NULL</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">theme</span><span class="p">(</span><span class="n">strip.text</span> <span class="o">=</span> <span class="nf">element_blank</span><span class="p">())</span> <span class="o">+</span>
<span class="nf">geom_text</span><span class="p">(</span>
<span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">y</span><span class="p">,</span> <span class="n">f</span> <span class="o">=</span> <span class="n">f</span><span class="p">,</span> <span class="n">label</span> <span class="o">=</span> <span class="n">label</span><span class="p">),</span>
<span class="n">data</span> <span class="o">=</span> <span class="nf">tribble</span><span class="p">(</span>
<span class="o">~</span><span class="n">x</span><span class="p">,</span> <span class="o">~</span><span class="n">y</span><span class="p">,</span> <span class="o">~</span><span class="n">f</span><span class="p">,</span> <span class="o">~</span><span class="n">label</span><span class="p">,</span>
<span class="m">140</span><span class="p">,</span> <span class="m">0.025</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span> <span class="s">"(1) The data tells us which\ndistributions are most likely."</span><span class="p">,</span>
<span class="m">140</span><span class="p">,</span> <span class="m">0.00</span><span class="p">,</span> <span class="m">1</span><span class="p">,</span>
<span class="s">"...but we don't know the true distribution."</span><span class="p">,</span>
<span class="m">140</span><span class="p">,</span> <span class="m">0.7</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span>
<span class="s">"(2) The data also tells us which\n10th percentiles are likely."</span><span class="p">,</span>
<span class="m">123</span><span class="p">,</span> <span class="m">0.6</span><span class="p">,</span> <span class="m">2</span><span class="p">,</span>
<span class="s">"(3) Considering the\nlikelihood of all the\npossible 10th\npercentiles, there is\n95% confidence that\nthe true values is above\nthe the B-Basis."</span>
<span class="p">),</span>
<span class="n">color</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="s">"black"</span><span class="p">,</span> <span class="s">"black"</span><span class="p">,</span> <span class="s">"blue"</span><span class="p">,</span> <span class="s">"red"</span><span class="p">)</span>
<span class="p">)</span> <span class="o">+</span>
<span class="nf">xlim</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="m">120</span><span class="p">,</span> <span class="m">150</span><span class="p">))</span>
</code></pre></div>
<p><img alt="explanation-graph-1" src="https://www.kloppenborg.ca/2022/11/explaining-basis-values/explaining-basis-values_files/figure-markdown/explanation-graph-1.png"></p>
<p>And that’s the graph at the beginning of this post.</p>Blogging with Quarto2022-07-18T00:00:00-04:002022-07-18T00:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2022-07-18:/2022/07/blogging-with-quarto/<p>I’ve recently started using <a href="https://quarto.org/">Quarto</a>, which is a new open source project
backed by <a href="https://www.rstudio.com/">RStudio</a>.
Quarto is a system for producing reports, presentations, books and blog posts.
It takes text formatted with markdown and code written in Python or R and produces
PDFs, <span class="caps">HTML</span> or several other formats that …</p><p>I’ve recently started using <a href="https://quarto.org/">Quarto</a>, which is a new open source project
backed by <a href="https://www.rstudio.com/">RStudio</a>.
Quarto is a system for producing reports, presentations, books and blog posts.
It takes text formatted with markdown and code written in Python or R and produces
PDFs, <span class="caps">HTML</span> or several other formats that contain the formatted text,
the code (optionally) and the outputs from that code.
In a lot of ways, Quarto is like R Markdown or Jupyter Notebooks.
Quarto uses Pandoc to do actually convert document formats, and Quarto actually works
quite well with version control software like git, unlike Jupyter Notebooks.</p>
<p>I’ve written about using R Markdown or Jupyter for <a href="https://www.kloppenborg.ca/2019/06/reproducibility/">reproducibility</a>
in engineering reports, and I’ve written about creating
<a href="https://www.kloppenborg.ca/2019/10/pandoc-report-templates/">custom document templates</a>
for reports written using Pandoc. Much of what I’ve written in those posts should
be applicable to Quarto.</p>
<p>To date, I’ve written two posts on this blog using Quarto:
<a href="https://www.kloppenborg.ca/2022/06/bow-stiffness/">Violin Bow Stiffness</a>
and <a href="https://www.kloppenborg.ca/2022/06/bonded-joint-shear/">Shear of Adhesive Bonded Joints</a>.
This post describes some of my experiences using Quarto.
Overall, I’ve been quite happy with it.</p>
<h1>Blogging with Quarto and Pelican</h1>
<p>Quarto has built-in support for blogging with Hugo.
However, this blog is using <a href="https://blog.getpelican.com/">Pelican</a>, not Hugo.
A few tweaks are needed.</p>
<p>Since the Quarto posts are written in Markdown, they have <span class="caps">YAML</span> headers.
Here is the header for one of my blog posts:</p>
<div class="highlight"><pre><span></span><code><span class="nn">---</span><span class="w"></span>
<span class="nt">title</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Shear of Adhesive Bonded Joints</span><span class="w"></span>
<span class="nt">date</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">2022-06-25</span><span class="w"></span>
<span class="nt">format</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">commonmark_x</span><span class="w"></span>
<span class="nt">keep-yaml</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span><span class="w"></span>
<span class="nt">Tags</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Engineering, Python, Adhesive Bonding</span><span class="w"></span>
<span class="nt">Category</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Posts</span><span class="w"></span>
<span class="nt">filters</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">attach-filter.lua</span><span class="w"></span>
<span class="nn">---</span><span class="w"></span>
</code></pre></div>
<p>Let’s go through the lines in this header one at a time:</p>
<ul>
<li><code>title</code>: Self-explanatory — the title of the post</li>
<li><code>date</code>: Also self-explanatory — the date of the post</li>
<li><code>format</code>: There are several output formats for Quarto. I’ve found that <code>commonmark_x</code> works
the best for Pelican. This output format produces a <code>.md</code> file in a format that (mostly)
works with Pelican.</li>
<li><code>keep-yaml</code>: Setting this option to <code>true</code> tells Quarto to copy the present <span class="caps">YAML</span> header to
the output <code>.md</code> file.</li>
<li><code>Tags</code>: This is an option used by Pelican. Since we’ve set <code>keep-yaml = true</code>, this gets
copied to the <code>.md</code> file that Pelican will process.</li>
<li><code>Category</code>: Another option used by Pelican.</li>
<li><code>filters</code>: We’ll talk about this next.</li>
</ul>
<h1>Lua Filters</h1>
<p>Pandoc uses something called a filter to alter the output.
These filters are written in a language called <a href="https://pandoc.org/lua-filters.html">Lua</a>.
In order for Pelican to include an image, the filename of the image needs to start with
<code>{attach}</code>. This tells Pelican to include the image file in the website output.</p>
<p>The following filter edits each image element when it’s being processed.
The name of the filter (<code>Image</code>) means that it applies to images.
This filter concatenates the string <code>{attach}</code> with the <code>src</code> attribute of the image and
stores the result in the <code>src</code> attribute of the resulting element.</p>
<div class="highlight"><pre><span></span><code><span class="kr">function</span> <span class="nf">Image</span> <span class="p">(</span><span class="n">elem</span><span class="p">)</span>
<span class="n">elem</span><span class="p">.</span><span class="n">src</span> <span class="o">=</span> <span class="s2">"{attach}"</span> <span class="o">..</span> <span class="n">elem</span><span class="p">.</span><span class="n">src</span>
<span class="kr">return</span> <span class="n">elem</span>
<span class="kr">end</span>
</code></pre></div>
<p>Similarly, for links, Pelican requires that links to internal files on the blog
start with <code>{filename}</code>. External links are used as-is. To do this, I use the following
filter, which applies to <code>Link</code> elements. It checks if the target of the link
starts with <code>http</code>. If so, it uses the link as-is. Otherwise, I assume that the
link is an internal link, so the filter pre-pends the target with <code>{filename}</code>.</p>
<div class="highlight"><pre><span></span><code><span class="kr">function</span> <span class="nf">Link</span> <span class="p">(</span><span class="n">elem</span><span class="p">)</span>
<span class="kr">if</span><span class="p">(</span> <span class="nb">string.find</span><span class="p">(</span><span class="n">elem</span><span class="p">.</span><span class="n">target</span><span class="p">,</span> <span class="s2">"http"</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="p">)</span>
<span class="kr">then</span>
<span class="kr">return</span> <span class="n">elem</span>
<span class="kr">else</span>
<span class="n">elem</span><span class="p">.</span><span class="n">target</span> <span class="o">=</span> <span class="s2">"{filename}"</span> <span class="o">..</span> <span class="n">elem</span><span class="p">.</span><span class="n">target</span>
<span class="kr">return</span> <span class="n">elem</span>
<span class="kr">end</span>
<span class="kr">end</span>
</code></pre></div>
<p>I’ve created a file called <code>attach-filter.lua</code> containing both of the filters above.
The <code>filters</code> line in the <span class="caps">YAML</span> header tells Quarto to use these filters when processing
the file.</p>Shear of Adhesive Bonded Joints2022-06-25T00:00:00-04:002022-06-25T00:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2022-06-25:/2022/06/bonded-joint-shear/<p>There are a lot of misconceptions about bonded joints. One of the
misconceptions that I’ve seen most often is that people think that the
average shear stress in a lap joint is predictive of the strength. This
same misconception is usually phrased as either:</p>
<ul>
<li>Doubling the overlap length of …</li></ul><p>There are a lot of misconceptions about bonded joints. One of the
misconceptions that I’ve seen most often is that people think that the
average shear stress in a lap joint is predictive of the strength. This
same misconception is usually phrased as either:</p>
<ul>
<li>Doubling the overlap length of a lap joint doubles the strength
(<strong>wrong!</strong>)</li>
<li>Calculate $P/A$ for the joint and make sure that the value is less
than the lap shear strength on the adhesive data-sheet (<strong>wrong!</strong>)</li>
</ul>
<p>In this post, I’m going to explain why these statements are incorrect.
I’m going to try to give you a understanding of how load transfer works
in an adhesive joint, and I’m going to share some Python code that
produces a first approximation of the stress distribution.</p>
<p>For simplicity, we’re going to ignore the effects of peel. Peel is the
tendency for the ends of a lap joint to separate. This can cause the
joint to fail in some cases, but considering the effects of peel
complicates the analysis of the joint — since the purpose of this post
is to give a basic understanding of the mechanics of the joint, I’m
going to ignore this complicating factor.</p>
<h2>The Joint</h2>
<p>In this post, I’m going to focus on a simple lap joint. In this type of
joint, two adherends overlap each other by a certain amount and there is
adhesive connecting the two adherends over the area in which they
overlap. These two adherends are then pulled apart. In this post, we’re
going to assume that the two adherends are homogeneous isotropic
materials (for example, sheet metal) and are uniform thickness. This
joint is shown in the figure below. In the top part of the figure, we
see the unstressed joint, and in the bottom, we see the joint under load.</p>
<p><img alt="Schematic of lap joint" src="https://www.kloppenborg.ca/2022/06/bonded-joint-shear/joint.svg"></p>
<p>Obviously, the deformation of the joint is exaggerated, but it allows us
to see what’s happening.</p>
<p>First, let’s look at the lower adherend. We see that the left edge of
the adherend is built-in (i.e. it can’t move). When load is applied, the
left portion of the lower adherend stretches a lot because it is
carrying the entirety of the reaction load.</p>
<p>As we move our gaze further to the right, but still focusing on the
lower adherend, we see that the further right we go, the less the
adherend is stretching. This is because the adhesive is transferring
load along the length of the joint. When we look at the right portion of
the lower adherend, it’ hardly stretched at all. Sure, it <em>moved</em>
because the rest of the adherend has stretched, but the right part of
the lower adherend has hardly stretched at all since it’s carrying no load.</p>
<p>If the two adherends are the same thickness, then symmetry will tell us
that the upper adherend behaves in the same way — but now is the right
end of the upper adherend that stretches a lot and the left end that
doesn’t stretch.</p>
<p>Now, let’s turn our attention to the adhesive. Shear strain can be
though of as an angle. At the very left edge of adhesive, the shear
strain is quite large, in the middle of the adhesive, the shear strain
is moderate and at the right edge of the adhesive, the shear strain is
quite large again. The relationship between shear stress and shear
strain of an adhesive is not linear, but nonetheless, a large strain
produces a large stress and a small strain produces a small stress. So,
the shear stress distribution in this joint is “U” shaped — there’s a
lot of stress at the ends and a smaller stress in the middle.</p>
<p>This “U” shaped shear stress distribution should be the first clue about
why using the average shear stress in the joint to predict failure might
not be the best idea.</p>
<h2>A Linear Model of the Joint</h2>
<p>The actual shear stress-shear strain relationship for most adhesives is
non-linear, but we’ll start our analysis of a lap joint by making the
assumption that the adhesive is linear-elastic.</p>
<p>Let’s start with defining the variables that we’ll need. The variables
are shown in the following figure.</p>
<p><img alt="Lap shear joint variables" src="https://www.kloppenborg.ca/2022/06/bonded-joint-shear/joint-variables.svg"></p>
<p>A force balance on the two adherends gives us:</p>
<p>$$
\frac{dN_1}{dx} - \tau = 0 $$</p>
<p>$$
\frac{dN_2}{dx} + \tau = 0 $$</p>
<p>We can find the deformation of the two adherends as follows:</p>
<p>$$
\frac{du_1}{dx} = \frac{N_1}{E_1^\prime t_1} $$</p>
<p>$$
\frac{du_2}{dx} = \frac{N_2}{E_2^\prime t_2} $$</p>
<p>Where $E_1^\prime$ and $E_2^\prime$ are the adherend plane-strain
elastic moduli.</p>
<p>The shear strain of the adhesive layer is given by:</p>
<p>$$
\gamma = \frac{1}{t_A} \left(u_1 - u_2\right) $$</p>
<p>We can differentiate this with respect to $x$ and then substitute in the
previous equations to get:</p>
<p>$$
\frac{d\gamma}{dx} = \frac{1}{t_A}\left(
\frac{du_1}{dx} - \frac{du_2}{dx}
\right) \
{} = \frac{1}{t_A}\left(
\frac{N_1}{E_1^\prime t_1} - \frac{N_2}{E_2^\prime t_2}
\right) $$</p>
<p>We can then differentiate this again with respect to $x$ and
substituting in the first equations, we get:</p>
<p>$$
\frac{d^2\gamma}{dx^2} = \frac{1}{t_A}\left(
\frac{dN_1}{dx}\frac{1}{E_1^\prime t_1}
- \frac{dN_2}{dx}\frac{1}{E_2^\prime t_2}
\right) \
{} = \tau\frac{1}{t_A}\left(
\frac{1}{E_1^\prime t_1}
+ \frac{1}{E_2^\prime t_2}
\right) $$</p>
<p>Remember that for now, we’re assuming that the adhesive is
linear-elastic. Thus:</p>
<p>$$
\tau = G_A \gamma $$</p>
<p>We can solve the second-order differential equation above, but we need
two boundary conditions. The boundary conditions that we choose are the
loads at the ends of the adherends. At the left end ($x=0$), the unit
load on the lower adherend ($N_2$) must be equal to the applied load
($P$) divided by the width ($w$) and the load on the upper adherend
($N_1$) must be zero. The opposite is true at the other end ($x=L$). Thus:</p>
<p>$$
\left.N_1\right|<em x="0">{x=0} = 0 \
\left.N_2\right|</em> = P/w $$</p>
<p>$$
\left.N_1\right|<em x="L">{x=L} = P/w \
\left.N_2\right|</em> = 0 $$</p>
<p>We can plug these into the equation for $\frac{d\gamma}{dx}$ at the two
ends of the joint and get the following boundary conditions that we will
enforce for the solution.</p>
<p>$$
\left.\frac{d\gamma}{dx}\right|_{x=0}
= \frac{1}{t_A}\left(
\frac{-P / w}{E_2^\prime t_2}
\right) $$</p>
<p>$$
\left.\frac{d\gamma}{dx}\right|_{x=L}
= \frac{1}{t_A}\left(
\frac{P / w}{E_1^\prime t_1}
\right) $$</p>
<p>There is a closed-form solution to this boundary value problem, which we
could find, but I think it’s more instructive to just find a numerical
solution — plus it’s easier to extend the numerical solution to the
case where the adhesive is non-linear. In order to find the numerical
solution, we’re going to use the Python package <code>scipy</code>, which includes
the function <code>solve_bvp()</code> for solving boundary-value problems. We’ll
start by importing the packages that we’ll use.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
<span class="kn">import</span> <span class="nn">scipy.integrate</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
</code></pre></div>
<p>Next, we’ll set the parameters for our solution. These include the
elastic moduli, thicknesses, overlap length and load.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">E1</span> <span class="o">=</span> <span class="mf">10.5e6</span> <span class="o">/</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="mf">0.33</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span>
<span class="n">E2</span> <span class="o">=</span> <span class="mf">10.5e6</span> <span class="o">/</span> <span class="p">(</span><span class="mi">1</span> <span class="o">-</span> <span class="mf">0.33</span><span class="o">**</span><span class="mi">2</span><span class="p">)</span>
<span class="n">t1</span> <span class="o">=</span> <span class="mf">0.063</span>
<span class="n">t2</span> <span class="o">=</span> <span class="mf">0.063</span>
<span class="n">Ga</span> <span class="o">=</span> <span class="mi">65500</span>
<span class="n">ta</span> <span class="o">=</span> <span class="mf">0.005</span>
<span class="n">L</span> <span class="o">=</span> <span class="mf">0.5</span>
<span class="n">w</span> <span class="o">=</span> <span class="mf">1.</span>
<span class="n">P</span> <span class="o">=</span> <span class="mi">2700</span>
</code></pre></div>
<p>The function <code>solve_bvp</code> requires two arguments: (i) a function that
returns the derivatives of the variables, and (ii) a function that
returns the residuals for the boundary conditions. We’re going to reduce
the second-order differential equation to a system of two first-order
differential equations by defining $y$ as follows. Based on this
definition, we can implement the two functions required by <code>solve_bvp</code>.</p>
<p>$$
y = \left[
\begin{matrix}
\frac{d\tau}{dx} <span class="amp">&</span> \tau
\end{matrix}
\right]^T $$</p>
<div class="cell-code highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">func1</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="n">D</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">matrix</span><span class="p">([</span>
<span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="n">Ga</span> <span class="o">/</span> <span class="n">ta</span> <span class="o">*</span> <span class="p">(</span><span class="mf">1.</span> <span class="o">/</span> <span class="p">(</span><span class="n">E1</span> <span class="o">*</span> <span class="n">t1</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1.</span> <span class="o">/</span> <span class="p">(</span><span class="n">E2</span> <span class="o">*</span> <span class="n">t2</span><span class="p">))],</span>
<span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">]</span>
<span class="p">])</span>
<span class="k">return</span> <span class="n">D</span> <span class="o">@</span> <span class="n">y</span>
<span class="k">def</span> <span class="nf">bc1</span><span class="p">(</span><span class="n">ya</span><span class="p">,</span> <span class="n">yb</span><span class="p">):</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span>
<span class="n">ya</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="mf">1.</span> <span class="o">/</span> <span class="n">ta</span> <span class="o">*</span> <span class="p">(</span><span class="o">-</span><span class="n">P</span> <span class="o">/</span> <span class="n">w</span> <span class="o">/</span> <span class="p">(</span><span class="n">E2</span> <span class="o">*</span> <span class="n">t2</span><span class="p">)),</span>
<span class="n">yb</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="mf">1.</span> <span class="o">/</span> <span class="n">ta</span> <span class="o">*</span> <span class="p">(</span><span class="n">P</span> <span class="o">/</span> <span class="n">w</span> <span class="o">/</span> <span class="p">(</span><span class="n">E1</span> <span class="o">*</span> <span class="n">t1</span><span class="p">))</span>
<span class="p">])</span>
<span class="n">res1</span> <span class="o">=</span> <span class="n">scipy</span><span class="o">.</span><span class="n">integrate</span><span class="o">.</span><span class="n">solve_bvp</span><span class="p">(</span>
<span class="n">func1</span><span class="p">,</span>
<span class="n">bc1</span><span class="p">,</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">L</span><span class="p">,</span> <span class="n">num</span><span class="o">=</span><span class="mi">50</span><span class="p">),</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">2</span><span class="p">,</span> <span class="mi">50</span><span class="p">))</span>
<span class="p">)</span>
</code></pre></div>
<p>The variable <code>res1</code> now contains the solution to our differential
equation. We can plot the shear strain ($\gamma$) over the length of the
joint as follows:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">res1</span><span class="o">.</span><span class="n">x</span><span class="p">,</span> <span class="n">res1</span><span class="o">.</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">,:])</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">"Linear Elastic Adhesive"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s2">"Shear Strain, $</span><span class="se">\\</span><span class="s2">gamma$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s2">"$x$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img alt="" src="https://www.kloppenborg.ca/2022/06/bonded-joint-shear/bonded-joint-shear_files/figure-commonmark_x/cell-5-output-1.png"></p>
<p>Because we’re assuming that the adhesive is linear-elastic, we can find
the shear stress by simply multiplying the elastic modulus $G_A$ by the
shear strain. The shear stress in the adhesive over the length of the
joint is thus:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">res1</span><span class="o">.</span><span class="n">x</span><span class="p">,</span> <span class="n">Ga</span> <span class="o">*</span> <span class="n">res1</span><span class="o">.</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">,:])</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">"Linear Elastic Adhesive"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s2">"Shear Stress, $</span><span class="se">\\</span><span class="s2">tau$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s2">"$x$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img alt="" src="https://www.kloppenborg.ca/2022/06/bonded-joint-shear/bonded-joint-shear_files/figure-commonmark_x/cell-6-output-1.png"></p>
<h2>Stress-Strain Curve</h2>
<p>The shear stress-strain curve for most adhesives is linear at low
strain, but highly nonlinear above a certain value of strain. It’s
common to idealize the stress-strain curve for an adhesive as
elastic-perfectly plastic. The important parameters for the adhesive
stress-strain curve are the initial shear modulus ($G_A$), the strain at
yield ($\gamma_y$), from which you can calculate a shear stress at
yield. The other important parameter is the ultimate strain, which we’ll
talk about later. The idealized stress-strain curve therefore looks like this:</p>
<p><img alt="" src="https://www.kloppenborg.ca/2022/06/bonded-joint-shear/stress-strain.svg"></p>
<p>The ordinary approach to solving the stress distribution within a bonded
joint involves finding the points along the length of the joint at which
the adhesive transitions from elastic to plastic and then solving the
elastic and plastic portions of the joint separately. If you try to
naively solve the equations above with an elastic-perfectly plastic
adhesive model, you’ll get errors since the Jacobian become singular.
For the purpose of keeping this blog post simple, we’ll cheat a little
bit and give the the stress-strain curve a very small slope above the
yield stress. This will eliminate the numerical issues, and as as long
as this slope is small enough, it won’t affect the results very much.</p>
<p>With this in mind, and considering that the strain could be positive or
negative, we implement a function to find the stress based on the strain
as follows:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">calc_tau</span><span class="p">(</span><span class="n">gamma</span><span class="p">):</span>
<span class="n">gamma_y</span> <span class="o">=</span> <span class="mf">0.09</span> <span class="c1"># the yield strain</span>
<span class="n">G_final</span> <span class="o">=</span> <span class="mi">1</span> <span class="c1"># a very small slope for the upper part of the curve</span>
<span class="n">sign</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">sign</span><span class="p">(</span><span class="n">gamma</span><span class="p">)</span>
<span class="k">if</span> <span class="n">np</span><span class="o">.</span><span class="n">abs</span><span class="p">(</span><span class="n">gamma</span><span class="p">)</span> <span class="o"><=</span> <span class="n">gamma_y</span><span class="p">:</span>
<span class="n">tau_unsigned</span> <span class="o">=</span> <span class="n">Ga</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">abs</span><span class="p">(</span><span class="n">gamma</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">tau_unsigned</span> <span class="o">=</span> <span class="n">Ga</span> <span class="o">*</span> <span class="n">gamma_y</span> <span class="o">+</span> \
<span class="n">G_final</span> <span class="o">*</span> <span class="p">(</span><span class="n">np</span><span class="o">.</span><span class="n">abs</span><span class="p">(</span><span class="n">gamma</span><span class="p">)</span> <span class="o">-</span> <span class="n">gamma_y</span><span class="p">)</span>
<span class="k">return</span> <span class="n">sign</span> <span class="o">*</span> <span class="n">tau_unsigned</span>
</code></pre></div>
<p>We’ll vectorize this function so that we can calculate an array of
stress values based on an array of strain values:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">calc_tau_vec</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">vectorize</span><span class="p">(</span><span class="n">calc_tau</span><span class="p">)</span>
</code></pre></div>
<h2>A Nonlinear Model of the Joint</h2>
<p>Now that we have a function to describe the way in which the adhesive
creates shear stress depending on its shear strain, we can implement the
solution to the differential equation again. Since the boundary
conditions don’t depend on the behavior of the adhesive, we can re-use
the same function for calculating the residuals of the boundary condition.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">func2</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="n">b</span> <span class="o">=</span> <span class="p">(</span><span class="mf">1.</span> <span class="o">/</span> <span class="p">(</span><span class="n">E1</span> <span class="o">*</span> <span class="n">t1</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1.</span> <span class="o">/</span> <span class="p">(</span><span class="n">E2</span> <span class="o">*</span> <span class="n">t2</span><span class="p">))</span> <span class="o">/</span> <span class="n">ta</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">row_stack</span><span class="p">((</span>
<span class="n">calc_tau_vec</span><span class="p">(</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="p">:])</span> <span class="o">*</span> <span class="n">b</span><span class="p">,</span>
<span class="n">y</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="p">:]</span>
<span class="p">))</span>
<span class="n">res2</span> <span class="o">=</span> <span class="n">scipy</span><span class="o">.</span><span class="n">integrate</span><span class="o">.</span><span class="n">solve_bvp</span><span class="p">(</span>
<span class="n">func2</span><span class="p">,</span>
<span class="n">bc1</span><span class="p">,</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">L</span><span class="p">,</span> <span class="n">num</span><span class="o">=</span><span class="mi">50</span><span class="p">),</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">2</span><span class="p">,</span> <span class="mi">50</span><span class="p">))</span>
<span class="p">)</span>
</code></pre></div>
<p>Here is the strain solution that we get:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">res2</span><span class="o">.</span><span class="n">x</span><span class="p">,</span> <span class="n">res2</span><span class="o">.</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">,:])</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">"Elastic-Plastic Adhesive"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s2">"Shear Strain, $</span><span class="se">\\</span><span class="s2">gamma$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s2">"$x$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img alt="" src="https://www.kloppenborg.ca/2022/06/bonded-joint-shear/bonded-joint-shear_files/figure-commonmark_x/cell-10-output-1.png"></p>
<p>And the corresponding adhesive shear stress solution is as follows:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">res2</span><span class="o">.</span><span class="n">x</span><span class="p">,</span> <span class="n">calc_tau_vec</span><span class="p">(</span><span class="n">res2</span><span class="o">.</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">,:]))</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">"Elastic-Plastic Adhesive"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s2">"Shear Stress, $</span><span class="se">\\</span><span class="s2">tau$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s2">"$x$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img alt="" src="https://www.kloppenborg.ca/2022/06/bonded-joint-shear/bonded-joint-shear_files/figure-commonmark_x/cell-11-output-1.png"></p>
<p>We’ll overlay the linear and the elastic-plastic models on top of each
other to clearly show the differences between the two models. First, we
notice that the elastic-plastic model has flat spots in the stress
distribution where the adhesive has yielded. These occur near the ends
of the joint. Next, we notice that the middle of the two stress
distributions look similar, but shifted: for the elastic-plastic model,
the stress in the “trough” is higher because the ends of this joint take
a smaller proportion of the entire load.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">res1</span><span class="o">.</span><span class="n">x</span><span class="p">,</span> <span class="n">Ga</span> <span class="o">*</span> <span class="n">res1</span><span class="o">.</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">,:],</span> <span class="n">label</span><span class="o">=</span><span class="s2">"Elastic Adhesive"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">res2</span><span class="o">.</span><span class="n">x</span><span class="p">,</span> <span class="n">calc_tau_vec</span><span class="p">(</span><span class="n">res2</span><span class="o">.</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">,:]),</span> <span class="n">label</span><span class="o">=</span><span class="s2">"Elastic-Plastic Adhesive"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">"Comparison of Shear Stress for Both Models"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s2">"Shear Stress, $</span><span class="se">\\</span><span class="s2">tau$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s2">"$x$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img alt="" src="https://www.kloppenborg.ca/2022/06/bonded-joint-shear/bonded-joint-shear_files/figure-commonmark_x/cell-12-output-1.png"></p>
<h2>Exploration</h2>
<p>We’ll create a function that takes several of the joint parameters as
arguments and returns the stress and strain distributions. We’ll use
this function to explore the effect of some of the joint parameters.
We’re only going to implement this for the elastic-plastic model.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">model</span><span class="p">(</span><span class="n">t1</span><span class="p">,</span> <span class="n">t2</span><span class="p">,</span> <span class="n">ta</span><span class="p">,</span> <span class="n">L</span><span class="p">,</span> <span class="n">P</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">ode</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">):</span>
<span class="n">b</span> <span class="o">=</span> <span class="p">(</span><span class="mf">1.</span> <span class="o">/</span> <span class="p">(</span><span class="n">E1</span> <span class="o">*</span> <span class="n">t1</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1.</span> <span class="o">/</span> <span class="p">(</span><span class="n">E2</span> <span class="o">*</span> <span class="n">t2</span><span class="p">))</span> <span class="o">/</span> <span class="n">ta</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">row_stack</span><span class="p">((</span>
<span class="n">calc_tau_vec</span><span class="p">(</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="p">:])</span> <span class="o">*</span> <span class="n">b</span><span class="p">,</span>
<span class="n">y</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="p">:]</span>
<span class="p">))</span>
<span class="k">def</span> <span class="nf">bc</span><span class="p">(</span><span class="n">ya</span><span class="p">,</span> <span class="n">yb</span><span class="p">):</span>
<span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span>
<span class="n">ya</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="mf">1.</span> <span class="o">/</span> <span class="n">ta</span> <span class="o">*</span> <span class="p">(</span><span class="o">-</span><span class="n">P</span> <span class="o">/</span> <span class="n">w</span> <span class="o">/</span> <span class="p">(</span><span class="n">E2</span> <span class="o">*</span> <span class="n">t2</span><span class="p">)),</span>
<span class="n">yb</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">-</span> <span class="mf">1.</span> <span class="o">/</span> <span class="n">ta</span> <span class="o">*</span> <span class="p">(</span><span class="n">P</span> <span class="o">/</span> <span class="n">w</span> <span class="o">/</span> <span class="p">(</span><span class="n">E1</span> <span class="o">*</span> <span class="n">t1</span><span class="p">))</span>
<span class="p">])</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">scipy</span><span class="o">.</span><span class="n">integrate</span><span class="o">.</span><span class="n">solve_bvp</span><span class="p">(</span>
<span class="n">ode</span><span class="p">,</span>
<span class="n">bc</span><span class="p">,</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">L</span><span class="p">,</span> <span class="n">num</span><span class="o">=</span><span class="mi">50</span><span class="p">),</span>
<span class="n">y</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">2</span><span class="p">,</span> <span class="mi">50</span><span class="p">))</span>
<span class="p">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">res</span><span class="o">.</span><span class="n">x</span>
<span class="n">gamma</span> <span class="o">=</span> <span class="n">res</span><span class="o">.</span><span class="n">y</span><span class="p">[</span><span class="mi">1</span><span class="p">,:]</span>
<span class="n">tau</span> <span class="o">=</span> <span class="n">calc_tau_vec</span><span class="p">(</span><span class="n">gamma</span><span class="p">)</span>
<span class="k">return</span> <span class="n">x</span><span class="p">,</span> <span class="n">gamma</span><span class="p">,</span> <span class="n">tau</span>
</code></pre></div>
<p>First, we’ll keep all of the parameters constant except that we’ll vary
the load. This will show us how the stress distribution changes as we
increase the load. The results aren’t surprising. At low loads, the
joint is fully elastic. As the load is increased, the adhesive at the
ends of the overlap start to yield. As load is increased further, the
yielded area grows and the “trough” gets shallower. Finally, the joint
becomes fully plastic. At this point, the joint would surely fail, but
since our model doesn’t check for failure, we don’t see this.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="k">for</span> <span class="n">Pi</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mi">1750</span><span class="p">,</span> <span class="mi">2950</span><span class="p">,</span> <span class="n">num</span><span class="o">=</span><span class="mi">4</span><span class="p">):</span>
<span class="n">x_i</span><span class="p">,</span> <span class="n">gamma_i</span><span class="p">,</span> <span class="n">tau_i</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span>
<span class="n">t1</span><span class="o">=</span><span class="n">t1</span><span class="p">,</span> <span class="n">t2</span><span class="o">=</span><span class="n">t2</span><span class="p">,</span> <span class="n">ta</span><span class="o">=</span><span class="n">ta</span><span class="p">,</span> <span class="n">L</span><span class="o">=</span><span class="n">L</span><span class="p">,</span> <span class="n">P</span><span class="o">=</span><span class="n">Pi</span>
<span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x_i</span><span class="p">,</span> <span class="n">tau_i</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s2">"P=</span><span class="si">{</span><span class="n">Pi</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">"Shear Stress With Various Loads"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s2">"Shear Stress, $</span><span class="se">\\</span><span class="s2">tau$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s2">"$x$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img alt="" src="https://www.kloppenborg.ca/2022/06/bonded-joint-shear/bonded-joint-shear_files/figure-commonmark_x/cell-14-output-1.png"></p>
<p>Next, we’ll see what happens when we change the thickness of the upper
adherend. In this example, the lower adherend has a thickness of
$t_2=0.063$ and we vary the thickness of the upper adherend ($t_1$) from
half this thickness to four times this thickness. As we can see, this
changes the length of the two plastic zones: in the extreme case of
$t_1=0.250$, there is no plastic zone on the right because the adherend
carrying the load at the right end of the joint is so stiff.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="k">for</span> <span class="n">t1_i</span> <span class="ow">in</span> <span class="p">[</span><span class="mf">0.032</span><span class="p">,</span> <span class="mf">0.063</span><span class="p">,</span> <span class="mf">.125</span><span class="p">,</span> <span class="mf">.250</span><span class="p">]:</span>
<span class="n">x_i</span><span class="p">,</span> <span class="n">gamma_i</span><span class="p">,</span> <span class="n">tau_i</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span>
<span class="n">t1</span><span class="o">=</span><span class="n">t1_i</span><span class="p">,</span> <span class="n">t2</span><span class="o">=</span><span class="n">t2</span><span class="p">,</span> <span class="n">ta</span><span class="o">=</span><span class="n">ta</span><span class="p">,</span> <span class="n">L</span><span class="o">=</span><span class="n">L</span><span class="p">,</span> <span class="n">P</span><span class="o">=</span><span class="n">P</span>
<span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x_i</span><span class="p">,</span> <span class="n">tau_i</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s2">"t1=</span><span class="si">{</span><span class="n">t1_i</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">"Various Upper Adherend Thicknesses"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s2">"Shear Stress, $</span><span class="se">\\</span><span class="s2">tau$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s2">"$x$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img alt="" src="https://www.kloppenborg.ca/2022/06/bonded-joint-shear/bonded-joint-shear_files/figure-commonmark_x/cell-15-output-1.png"></p>
<p>Finally, we’ll see the effect of changing the overlap length. This time,
we’re going to vary the overlap length $L$ and keep the <em>average shear
stress</em> ($P/A$) constant.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="k">for</span> <span class="n">Li</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">,</span> <span class="n">num</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
<span class="n">x_i</span><span class="p">,</span> <span class="n">gamma_i</span><span class="p">,</span> <span class="n">tau_i</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span>
<span class="n">t1</span><span class="o">=</span><span class="n">t1</span><span class="p">,</span> <span class="n">t2</span><span class="o">=</span><span class="n">t2</span><span class="p">,</span> <span class="n">ta</span><span class="o">=</span><span class="n">ta</span><span class="p">,</span> <span class="n">L</span><span class="o">=</span><span class="n">Li</span><span class="p">,</span> <span class="n">P</span><span class="o">=</span><span class="mi">5600</span><span class="o">*</span><span class="n">Li</span>
<span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x_i</span><span class="p">,</span> <span class="n">tau_i</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s2">"L=</span><span class="si">{</span><span class="n">Li</span><span class="si">}</span><span class="s2">, P=</span><span class="si">{</span><span class="mi">5600</span><span class="o">*</span><span class="n">Li</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">"Various Lap Lengths, Constant $P/A$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s2">"Shear Stress, $</span><span class="se">\\</span><span class="s2">tau$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s2">"$x$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img alt="" src="https://www.kloppenborg.ca/2022/06/bonded-joint-shear/bonded-joint-shear_files/figure-commonmark_x/cell-16-output-1.png"></p>
<p>Here, we see that for all three overlap lengths considered, the adhesive
at the ends of the lap is plastic and that there’s an elastic “trough”
in the middle of each joint. At this point, we might be tempted to
declare that all of the joints are able to carry at least the same
average shear stress ($P/A$), but before we do so, let’s look at the
shear strain in the adhesive layer for each of these cases.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="k">for</span> <span class="n">Li</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">linspace</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="mf">1.5</span><span class="p">,</span> <span class="n">num</span><span class="o">=</span><span class="mi">3</span><span class="p">):</span>
<span class="n">x_i</span><span class="p">,</span> <span class="n">gamma_i</span><span class="p">,</span> <span class="n">tau_i</span> <span class="o">=</span> <span class="n">model</span><span class="p">(</span>
<span class="n">t1</span><span class="o">=</span><span class="n">t1</span><span class="p">,</span> <span class="n">t2</span><span class="o">=</span><span class="n">t2</span><span class="p">,</span> <span class="n">ta</span><span class="o">=</span><span class="n">ta</span><span class="p">,</span> <span class="n">L</span><span class="o">=</span><span class="n">Li</span><span class="p">,</span> <span class="n">P</span><span class="o">=</span><span class="mi">5600</span><span class="o">*</span><span class="n">Li</span>
<span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x_i</span><span class="p">,</span> <span class="n">gamma_i</span><span class="p">,</span> <span class="n">label</span><span class="o">=</span><span class="sa">f</span><span class="s2">"L=</span><span class="si">{</span><span class="n">Li</span><span class="si">}</span><span class="s2">, P=</span><span class="si">{</span><span class="mi">5600</span><span class="o">*</span><span class="n">Li</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">"Various Lap Lengths, Constant $P/A$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s2">"Shear Strain, $</span><span class="se">\\</span><span class="s2">gamma$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s2">"$x$"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">legend</span><span class="p">()</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img alt="" src="https://www.kloppenborg.ca/2022/06/bonded-joint-shear/bonded-joint-shear_files/figure-commonmark_x/cell-17-output-1.png"></p>
<p>Here we see that the shear strain in the adhesive at the ends of the
longest joint is almost 0.9. Think about what that means: the “top” of
the adhesive layer has moved sideways relative to the “bottom” of the
layer by an amount almost equal to the thickness of the layer. In other
words, that a <strong>huge</strong> amount of strain.</p>
<p>The ultimate shear strain is going to depend on the type of adhesive
we’re using, as well as the environmental conditions (temperature,
moisture content, etc.). For a lot of adhesives, the ultimate strain is
going to be somewhere in the range of $0.2$ to $0.6$. So, in the three
examples shown here, the first overlap length ($L=0.5$) can probably
carry this value of $P/A$, the second overlap length ($L=1.0$) might be
able to carry it, but the third overlap length ($L=1.5$) almost
certainly will fail. This is the reason that you can’t use the average
shear stress ($P/A$) to size lap joints.</p>
<p>If you want to play around with this model, I’ve created a
<a href="https://www.kloppenborg.ca/adhesive-lap-no-peel/">widget</a> that implements this model.</p>Violin Bow Stiffness2022-06-15T00:00:00-04:002022-06-15T00:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2022-06-15:/2022/06/bow-stiffness/<p>I’ve made a few violin bows and a couple cello bows. I’m very much a
novice bow maker, but I’m learning. As I’m an engineer, I’m naturally
trying to apply engineering principles to bow making, which isn’t
necessarily easy since violin bows are actually …</p><p>I’ve made a few violin bows and a couple cello bows. I’m very much a
novice bow maker, but I’m learning. As I’m an engineer, I’m naturally
trying to apply engineering principles to bow making, which isn’t
necessarily easy since violin bows are actually very complex, despite
looking quite simple.</p>
<p>The stiffness of a bow affects what the player is able to do with it. If
a bow is too stiff, it becomes nearly unplayable; if it’s too soft, they
player can’t apply much force to the string before the stick bottoms out
and contacts the string (normally the hair of the bow contacts the
string). The stiffness affects how much camber the bow maker must add to
the stick. The wrong combination of stiffness and camber can lead to a
torsional-bending buckling mode, which will make the bow unplayable. The
mass and mass distribution of the bow has a large effect on playability.
Plus, the aesthetics of the bow are of importance. As I said, a bow is
quite complex.</p>
<p>The “standard” wood for making violin bows has been pernambuco for the
past 250 years. However, the tree that produces this wood is endangered
and hence this wood is difficult to obtain. I’ve been making bows out of
other types of wood — mostly ipe and snakewood. In order for a bow made
from ipe to have the same stiffness as a bow made from pernambuco, the
dimensions need to be altered. Hence, having a good understanding
between the taper of the stick and the resulting stiffness is important.</p>
<h1>Taper</h1>
<p>Henry Saint-George provides a procedure for calculating the taper of a
bow based on measurements of Tourte bows (<a href='#Saint-George_1896' id='ref-Saint-George_1896-1'>
SaintGeorge (1896)
</a>). In
this procedure, the bow is divided into 12 (unequal) segments. Referring
to the figure below (reproduced from Saint-George’s book), line <span class="caps">AC</span> is
constructed perpendicular to the bow with a length of 110 mm. A second
line <span class="caps">BD</span> is constructed perpendicular to the stick at the other end.
Saint-George indicates that the line <span class="caps">BD</span> is 22 mm when the total length
(<span class="caps">AB</span>) is 700 mm. A compass is used to draw the arc Ce. A line
perpendicular to the stick is then constructed starting from point e and
terminating at the line <span class="caps">CD</span>. The compass is re-set to draw the arc fg and
the process is repeated. The points A, e, g, i, k, etc. are the points
at which the diameter of the bow is set. At points A and e, the diameter
are set equal to one another. At points y and B, they are equal to
another fixed value. The diameter at the remaining points are each
decremented by a fixed value. But, since those points are not uniformly
spaced, the taper is not linear, but instead accelerates along the
length of the stick.</p>
<p><img alt="TaperProcedure" src="https://www.kloppenborg.ca/2022/06/bow-stiffness/images/TaperProcedure.png"></p>
<p>This procedure seems quite complicated. However, the keen reader might
recognize that the points along the stick form a geometric series. The
keen reader may also recognize that the values 22 mm and 700 mm cannot
both be taken as fixed: if you change the length of the bow (which
affects the slope of the line <span class="caps">CD</span>), you also need to change the length of
line <span class="caps">BD</span>, otherwise the procedure described above will not produce the
correct overall length.</p>
<p>The sum of each of these segments is given by:</p>
<p>$$
L = \sum_{k=0}^{12} C r^k = C\left(\frac{r^{12}-1}{r-1}\right) $$</p>
<p>Here, the value C is selected as 110 mm and the value of $r$ needs to be
found based on the value of $L$ chosen. This can be done numerically in
Python. The following code does that, then computes the points and the
diameters of the bow:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">scipy.optimize</span>
<span class="n">length</span> <span class="o">=</span> <span class="mf">700.</span>
<span class="n">length_constant</span> <span class="o">=</span> <span class="mf">110.</span>
<span class="n">d_butt</span> <span class="o">=</span> <span class="mf">8.6</span>
<span class="n">d_head</span> <span class="o">=</span> <span class="mf">5.6</span>
<span class="n">r</span> <span class="o">=</span> <span class="n">scipy</span><span class="o">.</span><span class="n">optimize</span><span class="o">.</span><span class="n">root</span><span class="p">(</span>
<span class="k">lambda</span> <span class="n">r</span><span class="p">:</span> <span class="n">length_constant</span> <span class="o">*</span> <span class="p">(</span><span class="n">r</span><span class="o">**</span><span class="mi">12</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">r</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">-</span> <span class="n">length</span><span class="p">,</span>
<span class="mi">22</span>
<span class="p">)</span><span class="o">.</span><span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"Found r = </span><span class="si">{</span><span class="n">r</span><span class="si">}</span><span class="se">\n</span><span class="s2">"</span><span class="p">)</span>
<span class="n">x_points</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.</span><span class="p">]</span> <span class="o">*</span> <span class="mi">13</span>
<span class="n">d_points</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.</span><span class="p">]</span> <span class="o">*</span> <span class="mi">13</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">13</span><span class="p">):</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="n">d_points</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">d_butt</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">x_points</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">length_constant</span> <span class="o">*</span> <span class="p">(</span><span class="n">r</span><span class="o">**</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">r</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">d_points</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">d_butt</span> <span class="o">+</span> <span class="p">(</span><span class="n">d_head</span> <span class="o">-</span> <span class="n">d_butt</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">i</span> <span class="o">-</span> <span class="mf">1.</span><span class="p">)</span> <span class="o">/</span> <span class="mf">10.</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="mi">12</span><span class="p">:</span>
<span class="n">d_points</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">d_head</span>
<span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">"x = </span><span class="si">{</span><span class="n">x_points</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="si">:</span><span class="s2">.1f</span><span class="si">}</span><span class="s2">, d = </span><span class="si">{</span><span class="n">d_points</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="si">:</span><span class="s2">.2f</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>Found r = 0.8741349707251251
x = 0.0, d = 8.60
x = 110.0, d = 8.60
x = 206.2, d = 8.30
x = 290.2, d = 8.00
x = 363.7, d = 7.70
x = 427.9, d = 7.40
x = 484.0, d = 7.10
x = 533.1, d = 6.80
x = 576.0, d = 6.50
x = 613.5, d = 6.20
x = 646.3, d = 5.90
x = 675.0, d = 5.60
x = 700.0, d = 5.60
</code></pre></div>
<p>We can plot the diameter of the stick:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
<span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x_points</span><span class="p">,</span> <span class="n">d_points</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">"Bow Diameter"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img alt="cell-3-output-1" src="https://www.kloppenborg.ca/2022/06/bow-stiffness/bow-stiffness_files/figure-commonmark_x/cell-3-output-1.png"></p>
<h1>Stiffness</h1>
<h2>Section Properties</h2>
<p>Bows are either (approximately) round or octagonal in cross-section. The
area moment of inertia of each of these are as follows (<a href='#MachinerysHandbook' id='ref-MachinerysHandbook-1'>
Oberg et al. (2000)
</a>):</p>
<table>
<thead>
<tr>
<th>Shape</th>
<th>Area Moment of Inertia</th>
</tr>
</thead>
<tbody>
<tr>
<td>Circle</td>
<td>$\frac{\pi d^4}{64} = 0.0490874 d^4$</td>
</tr>
<tr>
<td>Octagon</td>
<td>$\frac{2 d^2 \tan\frac{\pi}{8}}{12}\left[\frac{d^2 \left(1 + 2 \cos^2\frac{\pi}{8}\right)}{4\cos^2\frac{\pi}{8}}\right] = 0.0547379 d^4$</td>
</tr>
</tbody>
</table>
<p>Of course, when determining the stiffness of the bow, the modulus of
elasticity also needs to be known. From my research, the modulus of
elasticity of pernambuco is about 30 GPa. From my measurements, the
modulus of elasticity of ipe is about 20 GPa.</p>
<h2>Finite Element Method</h2>
<p>In order to determine the stiffness of the stick, we’ll use the finite
element method with tapered beam elements. This analysis will be done in
two dimensions. We’ll define a node at each of the <code>x</code> points found in
the previous calculation of bow taper with a tapered beam element
connecting adjacent nodes. The diameter (or width across flats in the
case of an octagonal cross-section) is known at each of the nodes. Our
model will assume that the variation in the diameter is linear between nodes.</p>
<p>The following derivation is based on Chapter 3 from <a href='#CookFEA' id='ref-CookFEA-1'>
Cook et al. (2001)
</a>, but
differs since the elements are tapered beams instead of constant section beams.</p>
<p>Each node will have two degrees of freedom: a transverse displacement
and a rotation. The degrees of freedom associated with a single element
(which connects two nodes) is thus:</p>
<p>$$
[d] = \left[
\matrix{
\nu_1 <span class="amp">&</span> \theta_1 <span class="amp">&</span> \nu_2 <span class="amp">&</span> \theta_2
}
\right] $$</p>
<p>Some of the algebra that we’ll use in the following derivation gets a
bit tedious, so we’ll use the symbolic mathematics package <code>sympy</code> to
help us:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">sympy</span>
<span class="c1"># Due to the way that my blogging platform works, we need to</span>
<span class="c1"># define a new function for printing symbolic math:</span>
<span class="k">def</span> <span class="nf">sym_print</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
<span class="nb">print</span><span class="p">(</span><span class="s1">'$$</span><span class="si">{}</span><span class="s1">$$'</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">sympy</span><span class="o">.</span><span class="n">printing</span><span class="o">.</span><span class="n">latex</span><span class="p">(</span><span class="n">x</span><span class="p">)))</span>
</code></pre></div>
<p>The shape function for our element is a function of the element length
$L$ and the position along the element $x$ and is given by:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">L</span> <span class="o">=</span> <span class="n">sympy</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="s2">"L"</span><span class="p">)</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">sympy</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span>
<span class="n">B</span> <span class="o">=</span> <span class="n">sympy</span><span class="o">.</span><span class="n">Matrix</span><span class="p">([[</span>
<span class="o">-</span><span class="mi">6</span> <span class="o">/</span> <span class="n">L</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">12</span> <span class="o">*</span> <span class="n">x</span> <span class="o">/</span> <span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="p">,</span>
<span class="o">-</span><span class="mi">4</span> <span class="o">/</span> <span class="n">L</span> <span class="o">+</span> <span class="mi">6</span> <span class="o">*</span> <span class="n">x</span> <span class="o">/</span> <span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="p">,</span>
<span class="mi">6</span> <span class="o">/</span> <span class="n">L</span><span class="o">**</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">12</span> <span class="o">*</span> <span class="n">x</span> <span class="o">/</span> <span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="p">,</span>
<span class="o">-</span><span class="mi">2</span> <span class="o">/</span> <span class="n">L</span> <span class="o">+</span> <span class="mi">6</span> <span class="o">*</span> <span class="n">x</span> <span class="o">/</span> <span class="n">L</span><span class="o">**</span><span class="mi">2</span>
<span class="p">]])</span>
<span class="n">sym_print</span><span class="p">(</span><span class="n">B</span><span class="p">)</span>
</code></pre></div>
<p>$$\left[\begin{matrix}- \frac{6}{L^{2}} + \frac{12 x}{L^{3}} <span class="amp">&</span> - \frac{4}{L} + \frac{6 x}{L^{2}} <span class="amp">&</span> \frac{6}{L^{2}} - \frac{12 x}{L^{3}} <span class="amp">&</span> - \frac{2}{L} + \frac{6 x}{L^{2}}\end{matrix}\right]$$</p>
<p>For the purpose of stiffness calculations, we’re idealizing the taper of
the bow so that within each element the taper is linear. This means that
the diameter of the stick at the point $x$ is given by the following.
Note that in this section, $x$ and $L$ refer to the distance along the
length of the element dn the length of the element, respectively, rather
than the dimensions of the bow.</p>
<p>$$
d = d_1 + \frac{x}{L}\left(d_2 - d_1\right) $$</p>
<p>where $d_1$ and $d_2$ are the diameters at nodes 1 and 2, respectively.
So that we don’t have to carry around so many variables, we’ll define
the variable $\beta$ such that:</p>
<p>$$
d = d_1 + \beta x $$</p>
<p>As we found earlier, for both circular sections and octagonal sections,
the moment of inertia ($I$) is a function of $d^4$. We’ll define a new
variable $\alpha$ such that:</p>
<p>$$
<span class="caps">EI</span> = \alpha d^4 $$</p>
<p>Combining the previous two equations and entering this into <code>sympy</code>, we get:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">alpha</span> <span class="o">=</span> <span class="n">sympy</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="s2">"</span><span class="se">\\</span><span class="s2">alpha"</span><span class="p">)</span>
<span class="n">d1</span> <span class="o">=</span> <span class="n">sympy</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="s2">"d_1"</span><span class="p">)</span>
<span class="n">beta</span> <span class="o">=</span> <span class="n">sympy</span><span class="o">.</span><span class="n">var</span><span class="p">(</span><span class="s2">"</span><span class="se">\\</span><span class="s2">beta"</span><span class="p">)</span>
<span class="n">EI</span> <span class="o">=</span> <span class="n">alpha</span> <span class="o">*</span> <span class="p">(</span><span class="n">d1</span> <span class="o">+</span> <span class="n">beta</span> <span class="o">*</span> <span class="n">x</span><span class="p">)</span><span class="o">**</span><span class="mi">4</span>
<span class="n">sym_print</span><span class="p">(</span><span class="n">EI</span><span class="p">)</span>
</code></pre></div>
<p>$$\alpha \left(\beta x + d_{1}\right)^{4}$$</p>
<p>The stiffness matrix for the element is given by:</p>
<p>$$
[k] = \int_0^L \left[B\right]^T <span class="caps">EI</span> \left[B\right] dx $$</p>
<p>Solving and simplifying this using <code>sympy</code>, we get the following. The
stiffness matrix is a 4x4 matrix that is quite complex, so we’ll show
one column at a time in this post:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">k</span> <span class="o">=</span> <span class="n">sympy</span><span class="o">.</span><span class="n">simplify</span><span class="p">(</span>
<span class="n">sympy</span><span class="o">.</span><span class="n">integrate</span><span class="p">(</span><span class="n">B</span><span class="o">.</span><span class="n">T</span> <span class="o">*</span> <span class="n">EI</span> <span class="o">*</span> <span class="n">B</span><span class="p">,</span> <span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">L</span><span class="p">))</span>
<span class="p">)</span>
</code></pre></div>
<div class="cell-code highlight"><pre><span></span><code><span class="c1"># The first column</span>
<span class="n">sym_print</span><span class="p">(</span><span class="n">k</span><span class="p">[:,</span><span class="mi">0</span><span class="p">])</span>
</code></pre></div>
<p>$$\left[\begin{matrix}\frac{12 \alpha \left(11 L^{4} \beta^{4} + 49 L^{3} \beta^{3} d_{1} + 84 L^{2} \beta^{2} d_{1}^{2} + 70 L \beta d_{1}^{3} + 35 d_{1}^{4}\right)}{35 L^{3}}\frac{2 \alpha \left(19 L^{4} \beta^{4} + 84 L^{3} \beta^{3} d_{1} + 147 L^{2} \beta^{2} d_{1}^{2} + 140 L \beta d_{1}^{3} + 105 d_{1}^{4}\right)}{35 L^{2}}\frac{12 \alpha \left(- 11 L^{4} \beta^{4} - 49 L^{3} \beta^{3} d_{1} - 84 L^{2} \beta^{2} d_{1}^{2} - 70 L \beta d_{1}^{3} - 35 d_{1}^{4}\right)}{35 L^{3}}\frac{2 \alpha \left(47 L^{4} \beta^{4} + 210 L^{3} \beta^{3} d_{1} + 357 L^{2} \beta^{2} d_{1}^{2} + 280 L \beta d_{1}^{3} + 105 d_{1}^{4}\right)}{35 L^{2}}\end{matrix}\right]$$</p>
<div class="cell-code highlight"><pre><span></span><code><span class="c1"># The second column</span>
<span class="n">sym_print</span><span class="p">(</span><span class="n">k</span><span class="p">[:,</span><span class="mi">1</span><span class="p">])</span>
</code></pre></div>
<p>$$\left[\begin{matrix}\frac{2 \alpha \left(19 L^{4} \beta^{4} + 84 L^{3} \beta^{3} d_{1} + 147 L^{2} \beta^{2} d_{1}^{2} + 140 L \beta d_{1}^{3} + 105 d_{1}^{4}\right)}{35 L^{2}}\frac{4 \alpha \left(3 L^{4} \beta^{4} + 14 L^{3} \beta^{3} d_{1} + 28 L^{2} \beta^{2} d_{1}^{2} + 35 L \beta d_{1}^{3} + 35 d_{1}^{4}\right)}{35 L}\frac{2 \alpha \left(- 19 L^{4} \beta^{4} - 84 L^{3} \beta^{3} d_{1} - 147 L^{2} \beta^{2} d_{1}^{2} - 140 L \beta d_{1}^{3} - 105 d_{1}^{4}\right)}{35 L^{2}}\frac{2 \alpha \left(13 L^{4} \beta^{4} + 56 L^{3} \beta^{3} d_{1} + 91 L^{2} \beta^{2} d_{1}^{2} + 70 L \beta d_{1}^{3} + 35 d_{1}^{4}\right)}{35 L}\end{matrix}\right]$$</p>
<div class="cell-code highlight"><pre><span></span><code><span class="c1"># The third column</span>
<span class="n">sym_print</span><span class="p">(</span><span class="n">k</span><span class="p">[:,</span><span class="mi">2</span><span class="p">])</span>
</code></pre></div>
<p>$$\left[\begin{matrix}\frac{12 \alpha \left(- 11 L^{4} \beta^{4} - 49 L^{3} \beta^{3} d_{1} - 84 L^{2} \beta^{2} d_{1}^{2} - 70 L \beta d_{1}^{3} - 35 d_{1}^{4}\right)}{35 L^{3}}\frac{2 \alpha \left(- 19 L^{4} \beta^{4} - 84 L^{3} \beta^{3} d_{1} - 147 L^{2} \beta^{2} d_{1}^{2} - 140 L \beta d_{1}^{3} - 105 d_{1}^{4}\right)}{35 L^{2}}\frac{12 \alpha \left(11 L^{4} \beta^{4} + 49 L^{3} \beta^{3} d_{1} + 84 L^{2} \beta^{2} d_{1}^{2} + 70 L \beta d_{1}^{3} + 35 d_{1}^{4}\right)}{35 L^{3}}\frac{2 \alpha \left(- 47 L^{4} \beta^{4} - 210 L^{3} \beta^{3} d_{1} - 357 L^{2} \beta^{2} d_{1}^{2} - 280 L \beta d_{1}^{3} - 105 d_{1}^{4}\right)}{35 L^{2}}\end{matrix}\right]$$</p>
<div class="cell-code highlight"><pre><span></span><code><span class="c1"># The fourth column</span>
<span class="n">sym_print</span><span class="p">(</span><span class="n">k</span><span class="p">[:,</span><span class="mi">3</span><span class="p">])</span>
</code></pre></div>
<p>$$\left[\begin{matrix}\frac{2 \alpha \left(47 L^{4} \beta^{4} + 210 L^{3} \beta^{3} d_{1} + 357 L^{2} \beta^{2} d_{1}^{2} + 280 L \beta d_{1}^{3} + 105 d_{1}^{4}\right)}{35 L^{2}}\frac{2 \alpha \left(13 L^{4} \beta^{4} + 56 L^{3} \beta^{3} d_{1} + 91 L^{2} \beta^{2} d_{1}^{2} + 70 L \beta d_{1}^{3} + 35 d_{1}^{4}\right)}{35 L}\frac{2 \alpha \left(- 47 L^{4} \beta^{4} - 210 L^{3} \beta^{3} d_{1} - 357 L^{2} \beta^{2} d_{1}^{2} - 280 L \beta d_{1}^{3} - 105 d_{1}^{4}\right)}{35 L^{2}}\frac{4 \alpha \left(17 L^{4} \beta^{4} + 77 L^{3} \beta^{3} d_{1} + 133 L^{2} \beta^{2} d_{1}^{2} + 105 L \beta d_{1}^{3} + 35 d_{1}^{4}\right)}{35 L}\end{matrix}\right]$$</p>
<p>We can now write a function that outputs the stiffness matrix for a
tapered beam element:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="nn">np</span>
</code></pre></div>
<div class="cell-code highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">elm_k</span><span class="p">(</span><span class="n">L</span><span class="p">,</span> <span class="n">d1</span><span class="p">,</span> <span class="n">d2</span><span class="p">,</span> <span class="n">alpha</span><span class="p">):</span>
<span class="n">b</span> <span class="o">=</span> <span class="p">(</span><span class="n">d2</span> <span class="o">-</span> <span class="n">d1</span><span class="p">)</span> <span class="o">/</span> <span class="n">L</span>
<span class="k">return</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">alpha</span> <span class="o">/</span> <span class="p">(</span><span class="mi">35</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="p">)</span> <span class="o">*</span> <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">(</span>
<span class="p">[</span>
<span class="p">[</span>
<span class="mi">6</span><span class="o">*</span><span class="p">(</span><span class="mi">11</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">+</span> <span class="mi">49</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">+</span> <span class="mi">84</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">70</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">35</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">),</span>
<span class="n">L</span><span class="o">*</span><span class="p">(</span><span class="mi">19</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">+</span> <span class="mi">84</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">+</span> <span class="mi">147</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">140</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">105</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">),</span>
<span class="mi">6</span><span class="o">*</span><span class="p">(</span><span class="o">-</span><span class="mi">11</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">-</span> <span class="mi">49</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">-</span> <span class="mi">84</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">70</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">-</span> <span class="mi">35</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">),</span>
<span class="n">L</span><span class="o">*</span><span class="p">(</span><span class="mi">47</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">+</span> <span class="mi">210</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">+</span> <span class="mi">357</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">280</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">105</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">)</span>
<span class="p">],</span>
<span class="p">[</span>
<span class="n">L</span><span class="o">*</span><span class="p">(</span><span class="mi">19</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">+</span> <span class="mi">84</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">+</span> <span class="mi">147</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">140</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">105</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">),</span>
<span class="mi">2</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="p">(</span><span class="mi">3</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">+</span> <span class="mi">14</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">+</span> <span class="mi">28</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">35</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">35</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">),</span>
<span class="n">L</span><span class="o">*</span><span class="p">(</span><span class="o">-</span><span class="mi">19</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">-</span> <span class="mi">84</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">-</span> <span class="mi">147</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">140</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">-</span> <span class="mi">105</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">),</span>
<span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="p">(</span><span class="mi">13</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">+</span> <span class="mi">56</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">+</span> <span class="mi">91</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">70</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">35</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">)</span>
<span class="p">],</span>
<span class="p">[</span>
<span class="mi">6</span><span class="o">*</span><span class="p">(</span><span class="o">-</span><span class="mi">11</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">-</span> <span class="mi">49</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">-</span> <span class="mi">84</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">70</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">-</span> <span class="mi">35</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">),</span>
<span class="n">L</span><span class="o">*</span><span class="p">(</span><span class="o">-</span><span class="mi">19</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">-</span> <span class="mi">84</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">-</span> <span class="mi">147</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">140</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">-</span> <span class="mi">105</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">),</span>
<span class="mi">6</span><span class="o">*</span><span class="p">(</span><span class="mi">11</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">+</span> <span class="mi">49</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">+</span> <span class="mi">84</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">70</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">35</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">),</span>
<span class="n">L</span><span class="o">*</span><span class="p">(</span><span class="o">-</span><span class="mi">47</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">-</span> <span class="mi">210</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">-</span> <span class="mi">357</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">280</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">-</span> <span class="mi">105</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">)</span>
<span class="p">],</span>
<span class="p">[</span>
<span class="n">L</span><span class="o">*</span><span class="p">(</span><span class="mi">47</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">+</span> <span class="mi">210</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">+</span> <span class="mi">357</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">280</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">105</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">),</span>
<span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="p">(</span><span class="mi">13</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">+</span> <span class="mi">56</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">+</span> <span class="mi">91</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">70</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">35</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">),</span>
<span class="n">L</span><span class="o">*</span><span class="p">(</span><span class="o">-</span><span class="mi">47</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">-</span> <span class="mi">210</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">-</span> <span class="mi">357</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">-</span> <span class="mi">280</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">-</span> <span class="mi">105</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">),</span>
<span class="mi">2</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="p">(</span><span class="mi">17</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">4</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">4</span> <span class="o">+</span> <span class="mi">77</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">3</span><span class="o">*</span><span class="n">d1</span> <span class="o">+</span> <span class="mi">133</span><span class="o">*</span><span class="n">L</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">b</span><span class="o">**</span><span class="mi">2</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">105</span><span class="o">*</span><span class="n">L</span><span class="o">*</span><span class="n">b</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">3</span> <span class="o">+</span> <span class="mi">35</span><span class="o">*</span><span class="n">d1</span><span class="o">**</span><span class="mi">4</span><span class="p">)</span>
<span class="p">]</span>
<span class="p">]</span>
<span class="p">)</span>
</code></pre></div>
<h2>Stroup Test</h2>
<p>The Stroup Test is a way of testing the stiffness of the stick of a bow.
In this test, the bow is mounted in a jig that supports the stick on two
rollers that are 575 mm apart. A transverse force of 2 lb is applied
mid-way between the two rollers and the deflection at the force
application point is measured. From what I can tell, there were a small
number of people advocating this test some time ago, but it has since
become quite uncommon — most makers will assess the stiffness of a
stick by feel. However, the Stroup Test can be easily implemented using
the finite element method for the purpose of assessing relative
stiffness of sticks made from different materials with different dimensions.</p>
<h2>Implementing the Stroup Test</h2>
<p>We already have a list of nodal locations. We’ll choose one of these
nodes as the location of one of the supports (we’ll use the second last
node for this). We’ll need to ensure that there are two other nodes for
the load application point and the other support in the correct
location. We’ll likely need to create these nodes and sub-divide the
existing elements. We can do this in Python as follows:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">x_nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">d_nodes</span> <span class="o">=</span> <span class="p">[]</span>
<span class="n">x_s2</span> <span class="o">=</span> <span class="n">x_points</span><span class="p">[</span><span class="mi">11</span><span class="p">]</span>
<span class="n">x_s1</span> <span class="o">=</span> <span class="n">x_s2</span> <span class="o">-</span> <span class="mi">575</span>
<span class="n">x_l</span> <span class="o">=</span> <span class="n">x_s2</span> <span class="o">-</span> <span class="mi">575</span> <span class="o">/</span> <span class="mi">2</span>
<span class="n">nid_s1</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="c1"># storage for node ID of support #1</span>
<span class="n">nid_s2</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="c1"># storage for node ID of support #2</span>
<span class="n">nid_l</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="c1"># storage for node ID of load application</span>
<span class="n">tol</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">xa</span><span class="p">,</span> <span class="n">xb</span><span class="p">:</span> <span class="nb">abs</span><span class="p">(</span><span class="n">xa</span> <span class="o">-</span> <span class="n">xb</span><span class="p">)</span> <span class="o"><</span> <span class="mf">1e-3</span>
<span class="n">inside</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">xa</span><span class="p">,</span> <span class="n">xb</span><span class="p">:</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">xa</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">xb</span><span class="p">)</span> <span class="o"><</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">,</span> <span class="n">d1</span><span class="p">,</span> <span class="n">d2</span> <span class="ow">in</span> <span class="nb">zip</span><span class="p">(</span>
<span class="n">x_points</span><span class="p">,</span> <span class="n">x_points</span><span class="p">[</span><span class="mi">1</span><span class="p">:],</span> <span class="n">d_points</span><span class="p">,</span> <span class="n">d_points</span><span class="p">[</span><span class="mi">1</span><span class="p">:]):</span>
<span class="n">x_nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">x1</span><span class="p">)</span>
<span class="n">d_nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">d1</span><span class="p">)</span>
<span class="k">if</span> <span class="n">tol</span><span class="p">(</span><span class="n">x_s1</span><span class="p">,</span> <span class="n">x1</span><span class="p">):</span>
<span class="n">nid_s1</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">x_nodes</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="k">elif</span> <span class="n">inside</span><span class="p">(</span><span class="n">x_s1</span><span class="p">,</span> <span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">):</span>
<span class="n">x_nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">x_s1</span><span class="p">)</span>
<span class="n">d_nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">d1</span> <span class="o">+</span> <span class="p">(</span><span class="n">x_s1</span> <span class="o">-</span> <span class="n">x1</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">x2</span> <span class="o">-</span> <span class="n">x1</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">d2</span> <span class="o">-</span> <span class="n">d1</span><span class="p">))</span>
<span class="n">nid_s1</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">x_nodes</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">tol</span><span class="p">(</span><span class="n">x_s2</span><span class="p">,</span> <span class="n">x1</span><span class="p">):</span>
<span class="n">nid_s2</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">x_nodes</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="k">elif</span> <span class="n">inside</span><span class="p">(</span><span class="n">x_s2</span><span class="p">,</span> <span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">):</span>
<span class="n">x_nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">x_s2</span><span class="p">)</span>
<span class="n">d_nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">d1</span> <span class="o">+</span> <span class="p">(</span><span class="n">x_s2</span> <span class="o">-</span> <span class="n">x1</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">x2</span> <span class="o">-</span> <span class="n">x1</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">d2</span> <span class="o">-</span> <span class="n">d1</span><span class="p">))</span>
<span class="n">nid_s2</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">x_nodes</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="k">if</span> <span class="n">tol</span><span class="p">(</span><span class="n">x_l</span><span class="p">,</span> <span class="n">x1</span><span class="p">):</span>
<span class="n">nid_l</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">x_nodes</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="k">elif</span> <span class="n">inside</span><span class="p">(</span><span class="n">x_l</span><span class="p">,</span> <span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">):</span>
<span class="n">x_nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">x_l</span><span class="p">)</span>
<span class="n">d_nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">d1</span> <span class="o">+</span> <span class="p">(</span><span class="n">x_l</span> <span class="o">-</span> <span class="n">x1</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="n">x2</span> <span class="o">-</span> <span class="n">x1</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">d2</span> <span class="o">-</span> <span class="n">d1</span><span class="p">))</span>
<span class="n">nid_l</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">x_nodes</span><span class="p">)</span> <span class="o">-</span> <span class="mi">1</span>
<span class="n">x_nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">x_points</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
<span class="n">d_nodes</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">d_points</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span>
</code></pre></div>
<p>We can now build a stiffness matrix for the model. There are now 15
nodes and each node has 2 <span class="caps">DOF</span>, so the matrix will be 30 x 30. We’ll use
a sparse matrix. We’ll assume that all elements are round and the
material has a modulus of 30 GPa.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">k_model</span><span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">((</span><span class="mi">2</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">x_nodes</span><span class="p">),</span> <span class="mi">2</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">x_nodes</span><span class="p">)))</span>
<span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="p">(</span><span class="n">x1</span><span class="p">,</span> <span class="n">x2</span><span class="p">,</span> <span class="n">d1</span><span class="p">,</span> <span class="n">d2</span><span class="p">)</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span>
<span class="nb">zip</span><span class="p">(</span><span class="n">x_nodes</span><span class="p">,</span> <span class="n">x_nodes</span><span class="p">[</span><span class="mi">1</span><span class="p">:],</span> <span class="n">d_nodes</span><span class="p">,</span> <span class="n">d_nodes</span><span class="p">[</span><span class="mi">1</span><span class="p">:])):</span>
<span class="c1"># Each element connects the two adjacent nodes</span>
<span class="n">k_elm</span> <span class="o">=</span> <span class="n">elm_k</span><span class="p">(</span>
<span class="n">L</span> <span class="o">=</span> <span class="n">x2</span> <span class="o">-</span> <span class="n">x1</span><span class="p">,</span>
<span class="n">d1</span> <span class="o">=</span> <span class="n">d1</span><span class="p">,</span>
<span class="n">d2</span> <span class="o">=</span> <span class="n">d2</span><span class="p">,</span>
<span class="n">alpha</span> <span class="o">=</span> <span class="mf">0.0490874</span> <span class="o">*</span> <span class="mf">30e3</span>
<span class="p">)</span>
<span class="k">for</span> <span class="n">ii</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">):</span>
<span class="k">for</span> <span class="n">jj</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">):</span>
<span class="n">k_model</span><span class="p">[</span><span class="n">i</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">+</span> <span class="n">ii</span><span class="p">,</span> <span class="n">i</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">+</span> <span class="n">jj</span><span class="p">]</span> <span class="o">+=</span> <span class="n">k_elm</span><span class="p">[</span><span class="n">ii</span><span class="p">,</span><span class="n">jj</span><span class="p">]</span>
</code></pre></div>
<p>We can visualize the stiffness matrix. As expected, all of the elements
are near the diagonal.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">plt</span><span class="o">.</span><span class="n">matshow</span><span class="p">(</span><span class="n">k_model</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">"Visualization of Stiffness Matrix"</span><span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>Text(0.5, 1.0, 'Visualization of Stiffness Matrix')
</code></pre></div>
<p><img alt="cell-16-output-2" src="https://www.kloppenborg.ca/2022/06/bow-stiffness/bow-stiffness_files/figure-commonmark_x/cell-16-output-2.png"></p>
<p>Next, we will create the load vector. This vector will have all elements
set to zero except for the entry corresponding to the first <span class="caps">DOF</span> of the
loading node.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">p_model</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">x_nodes</span><span class="p">))</span>
<span class="n">p_model</span><span class="p">[</span><span class="n">nid_l</span> <span class="o">*</span> <span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="o">-</span><span class="mf">8.9075</span> <span class="c1"># 2 lb in N</span>
</code></pre></div>
<p>Next, we’ll take away the constrained DOFs from the stiffness matrix and
the load vector. In our case, those DOFs are the transverse displacement
of the constrained nodes.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">mask</span> <span class="o">=</span> <span class="p">[</span><span class="n">i</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">p_model</span><span class="p">)</span>
<span class="k">if</span> <span class="n">i</span> <span class="o">!=</span> <span class="n">nid_s1</span> <span class="o">*</span> <span class="mi">2</span> <span class="ow">and</span> <span class="n">i</span> <span class="o">!=</span> <span class="n">nid_s2</span> <span class="o">*</span> <span class="mi">2</span><span class="p">]</span>
<span class="n">p_const</span> <span class="o">=</span> <span class="n">p_model</span><span class="p">[</span><span class="n">mask</span><span class="p">]</span>
<span class="n">k_const</span> <span class="o">=</span> <span class="n">k_model</span><span class="p">[</span><span class="n">mask</span><span class="p">,</span> <span class="p">:]</span>
<span class="n">k_const</span> <span class="o">=</span> <span class="n">k_const</span><span class="p">[:,</span> <span class="n">mask</span><span class="p">]</span>
</code></pre></div>
<p>Now, we can solve for the deflections:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="kn">import</span> <span class="nn">scipy.linalg</span>
<span class="n">d_const</span> <span class="o">=</span> <span class="n">scipy</span><span class="o">.</span><span class="n">linalg</span><span class="o">.</span><span class="n">solve</span><span class="p">(</span><span class="n">k_const</span><span class="p">,</span> <span class="n">p_const</span><span class="p">)</span>
</code></pre></div>
<p>Now, we can add back in the constrained DOFs into the displacement
solution. These will be zero because these DOFs were constrained.</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">d_model</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="mi">2</span> <span class="o">*</span> <span class="nb">len</span><span class="p">(</span><span class="n">x_nodes</span><span class="p">))</span>
<span class="n">d_model</span><span class="p">[</span><span class="n">mask</span><span class="p">]</span> <span class="o">=</span> <span class="n">d_const</span>
</code></pre></div>
<p>Now, we can plot the results:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="n">plt</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">x_nodes</span><span class="p">,</span> <span class="n">d_model</span><span class="p">[</span><span class="mi">0</span><span class="p">::</span><span class="mi">2</span><span class="p">])</span>
<span class="n">plt</span><span class="o">.</span><span class="n">grid</span><span class="p">()</span>
<span class="n">plt</span><span class="o">.</span><span class="n">title</span><span class="p">(</span><span class="s2">"Deflection"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">xlabel</span><span class="p">(</span><span class="s2">"x"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">ylabel</span><span class="p">(</span><span class="s2">"Vertical Deflection"</span><span class="p">)</span>
<span class="n">plt</span><span class="o">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div>
<p><img alt="cell-21-output-1" src="https://www.kloppenborg.ca/2022/06/bow-stiffness/bow-stiffness_files/figure-commonmark_x/cell-21-output-1.png"></p>
<p>Stroup values are normally given in thousandths of an inch, which we can
calculate as follows:</p>
<div class="cell-code highlight"><pre><span></span><code><span class="o">-</span><span class="n">d_model</span><span class="p">[</span><span class="n">nid_l</span> <span class="o">*</span> <span class="mi">2</span><span class="p">]</span> <span class="o">/</span> <span class="mf">25.4</span> <span class="o">*</span> <span class="mi">1000</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="mf">301.20020904559294</span><span class="w"></span>
</code></pre></div>
<h1>Conclusion</h1>
<p>This blog post describes a way of numerically finding the relationship
between the stiffness of a violin bow and its taper. We used the finite
element method to do so. I’m planning on developing an online calculator
for performing this computation. I plan to use an early version of
<a href="https://pyscript.net/"><code>py-script</code></a> to do so, but since I’ve never used
<code>py-script</code>, it’s possible that it will take a while to figure it out.</p>Speeding up Quadrature2021-09-18T00:00:00-04:002021-09-18T00:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2021-09-18:/2021/09/speeding-up-quadrature/<p>Up until recently, I hadn’t really thought about the way that numerical integration was performed.
Sure, I knew about some techniques like using the trapezoid rule to perform numerical integration,
and without thinking about it too much, I had just assumed that the integration routines like
R’s <code>integrate …</code></p><p>Up until recently, I hadn’t really thought about the way that numerical integration was performed.
Sure, I knew about some techniques like using the trapezoid rule to perform numerical integration,
and without thinking about it too much, I had just assumed that the integration routines like
R’s <code>integrate</code> function used this technique too. But, I was wrong — most libraries that implement
numerical integration use adaptive quadrature.</p>
<p>Adaptive quadrature is actually a rather interesting technique. I won’t go into too much detail
here, but the function being integrated (the integrand) is evaluated at a number of points
within the integration range, and the function values are multiplied by a set of weights.
In mathematical terms:</p>
<p>$$
\int_a^b f\left(x\right) dx \approx \sum_i^n w_i f\left(x_i\right) $$</p>
<p>Where the weights, $w_i$ and the evaluation points, $x_i$ are tabulated values.
These values can be taken from references such as <a href='#HandbookMathFunctions' id='ref-HandbookMathFunctions-1'>
Abramowitz (1972)
</a>.</p>
<p>The <a href="http://www.gnu.org/software/gsl/"><span class="caps">GNU</span> Scientific Library</a>
uses two different sets of $w_i$ and $x_i$: the first set
are 15-point Kronrod weights, and the second set are 7-point
Gausian weights. The estimate of the integral is computed using these two sets of weight
and the absolute value of the difference between the two results is an upper bound on the error.</p>
<p>If the error is too great, the range is sub-divided and the integral of each sub-divided
range is summed to produce the complete integral — as are the error estimates. This sub-division
procedure is the “adaptive” part of adaptive quadrature.</p>
<p>I’ve been working on a computational problem that involves the computation of an expression
of the following form:</p>
<p>$$
\frac{
\int_{-\infty}^\lambda g(t)A(t)dt + \int_{\lambda}^\infty h(t)A(t)dt
}{
\int_{-\infty}^{\infty}A(t)dt
} $$</p>
<p>In my particular problem, $A(t)$ is expensive to compute, while
$g(t)$ and $h(t)$ are relatively computationally cheap.</p>
<p>In my use case, I need to compute this integral many times with slightly different
$g(t)$ and $h(t)$ functions, but with the $A(t)$ function identical each time.</p>
<p>For now, let’s ignore the integration bounds for these four integrals. We’ll revisit the bounds
shortly. The quadrature estimate of first integral (containing $g(t)$) will be:</p>
<p>$$
\int g(t) A(t) dt \approx \sum_i^n w_i f(x_i)
= \sum_i^n w_i g(x_i) A(x_i) $$</p>
<p>Thus, we can pre-compute the values of $A(x_i)$ once and avoid computing them again.
A similar procedure can be used for the other three integrals in the original expression.</p>
<p>I’ve implemented this approach of pre-computing the values of $A(x_i)$ in C++.
I’ve run this several times with different repetitions and compared the speed
to a “naive” approach where the complete integration is performed each time.
The results are as follows:</p>
<table>
<thead>
<tr>
<th>Repetitions</th>
<th>Naive Approach</th>
<th>Pre-Computing $A(x_i)$</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0.485 ms</td>
<td>1.625 ms</td>
</tr>
<tr>
<td>10</td>
<td>3.65 ms</td>
<td>1.42 ms</td>
</tr>
<tr>
<td>100</td>
<td>57.4 ms</td>
<td>1.34 ms</td>
</tr>
<tr>
<td>1000</td>
<td>317 ms</td>
<td>1.24 ms</td>
</tr>
<tr>
<td>10000</td>
<td>2645 ms</td>
<td>1.11 ms</td>
</tr>
</tbody>
</table>
<p>Using the naive approach, the time scales roughly linearly with the number of
repetitions, while the approach where we pre-compute the value of $A(x_i)$ is
roughly constant regardless of the number of repetitions. The specific values
shown here are based on a single-run of the code, so the results will be affected
the whatever else my <span class="caps">PC</span> was doing at the time, but we can still see general trends.</p>
<p>Returning to the discussion of the integration bounds:
first, the bounds of the two integrals in the numerator and the bounds
of the integral in the denominator are all different. To account for this,
we compute the integral using the widest bounds, subdivide the range as
required to achieve a suitable error estimate. Then for the smaller range,
we choose the subdivisions that are within the new range, adding a smaller
subdivision at one end if needed.</p>
<p>Second, you’ll notice that
some of the integration bounds are infinite. This is handled by a clever trick
that I would not have though of myself — a change of variables. In my code, I’ve
used a $\tan$ transformation; in the <span class="caps">GNU</span> Scientific Library, they use a different
transform that contains a singularity. This singularity is okay if you’re not
altering the integration bounds after starting the computation
(which <span class="caps">GSL</span> does not), but can lead to trouble otherwise.
After this change of variables, the integration becomes:</p>
<p>$$
\int_{-\infty}^\infty f(x) dx
= \int_{-\pi/2}^{\pi/2} f(\tan(t)) \cos^2(t) dt $$</p>
<p>With this transformation, the integration bounds become finite.</p>
<p>Using these few tricks, the quadrature for this particular problem
can be sped up significantly. These tricks won’t work for all
problems, though.</p>Long-Running Vignettes for R Packages2021-06-21T00:00:00-04:002021-06-21T00:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2021-06-21:/2021/06/long-running-vignettes/<p>I’m going to release a new version of <a href="https://www.cmstatr.net"><code>cmstatr</code></a>
soon. This new version includes, amongst other things, a new vignette.
In R packages, a vignette is a type of long-form documentation.
This particular vignette includes a simulation study that helps to
demonstrate the validity of a particular statistical method …</p><p>I’m going to release a new version of <a href="https://www.cmstatr.net"><code>cmstatr</code></a>
soon. This new version includes, amongst other things, a new vignette.
In R packages, a vignette is a type of long-form documentation.
This particular vignette includes a simulation study that helps to
demonstrate the validity of a particular statistical method. This simulation
study takes a long time to run, though. It takes long enough that I
don’t want to sit and wait for it to run every time I check that package,
and I don’t want to waste resources on the <code>CRAN</code> servers and force their
servers to re-run my vignette every time they check that package.</p>
<p>Jeroen Oorms wrote a <a href="https://ropensci.org/blog/2019/12/08/precompute-vignettes/">blog post at rOpenSci</a>
about this topic. I decided to follow the advice in that blog post and
pre-compute the new vignette on my computer, and avoid having to re-run
it every time the package is checked. The blog post doesn’t include all
of the necessary information for vignettes that include graphs, though.
This present blog post is intended to fill in that gap.</p>
<p>The basic idea is that you take your long-running vignette and rename it
with the extension <code>.Rmd.orig</code> so that R (and <span class="caps">CRAN</span>) doesn’t try to
build it, because it doesn’t recognize it as an RMarkdown file. Then you
write a script that invokes <code>knitr</code> to to run the executable code
in the vignette and write a <code>.Rmd</code> file where the code is no longer
executable. With this approach, when R tried to re-build the vignette,
none of the code is executable, and it runs almost instantly.</p>
<p>In the case of the new vignette being added to <code>cmstatr</code>, the filename
of the vignette is <code>hk_ext.Rmd</code>.</p>
<p>The first step is easy. Just rename the vignette from <code>hk_ext.Rmd</code>
to <code>hk_ext.Rmd.orig</code>.</p>
<p>If were were to run the function <code>knitr::knit("hk_ext.Rmd.orig", output = "hk_ext.Rmd")</code>,
it would create the <code>.Rmd</code> file with the executable code turned into
non-executable code, and with the results of the code included.
The figures would be located in folder <code>figures/</code> and referenced
by the resulting markdown file. However, the path to <code>figures/</code> will
be relative to the current working directory. This is a problem, since
the current working directory will (likely) be the root directory of
the package, and the vignettes are stored in the <code>vignettes/</code>
sub-folder.</p>
<p>We can fix this problem by using the following script to re-build
the vignette. I’ve saved this script with the very verbose filename
<code>rebuild-long-running-vignette.R</code>.</p>
<div class="highlight"><pre><span></span><code><span class="n">old_wd</span> <span class="o"><-</span> <span class="nf">getwd</span><span class="p">()</span>
<span class="nf">setwd</span><span class="p">(</span><span class="s">"vignettes/"</span><span class="p">)</span>
<span class="n">knitr</span><span class="o">::</span><span class="nf">knit</span><span class="p">(</span><span class="s">"hk_ext.Rmd.orig"</span><span class="p">,</span> <span class="n">output</span> <span class="o">=</span> <span class="s">"hk_ext.Rmd"</span><span class="p">)</span>
<span class="n">knitr</span><span class="o">::</span><span class="nf">purl</span><span class="p">(</span><span class="s">"hk_ext.Rmd.orig"</span><span class="p">,</span> <span class="n">output</span> <span class="o">=</span> <span class="s">"hk_ext.R"</span><span class="p">)</span>
<span class="nf">setwd</span><span class="p">(</span><span class="n">old_wd</span><span class="p">)</span>
</code></pre></div>
<p>This sets the working directory to the <code>vignettes/</code> sub-folder, rebuilds
the vignette then sets the working directory back to what it originally was.</p>
<p>We also need to make a change to the setup chunk of our vignette
(<code>hk_ext.Rmd.orig</code>). This will tell <code>knitr</code> to put the resulting figures
in the same folder as the vignette, rather than a sub-folder.</p>
<div class="highlight"><pre><span></span><code><span class="n">knitr</span><span class="o">::</span><span class="n">opts_chunk</span><span class="o">$</span><span class="nf">set</span><span class="p">(</span>
<span class="n">collapse</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">,</span>
<span class="n">comment</span> <span class="o">=</span> <span class="s">"#>"</span><span class="p">,</span>
<span class="n">fig.path</span> <span class="o">=</span> <span class="s">""</span> <span class="c1"># Added this line to the standard setup chunk</span>
<span class="p">)</span>
</code></pre></div>
<p>Now to rebuild the vignette, you just run the script
<code>rebuild-long-running-vignette.R</code>. This script should be added to
<code>.Rbuildignore</code> so that it doesn’t get included in the built package.
Similarly, the <code>.Rmd.orig</code> file needs to be added to the
<code>.Rbuildignore</code> file.</p>
<p>The other issue is remembering to update the vignette, now that it’s not
automatic. I personally use <code>devtools</code> to release packages to <span class="caps">CRAN</span>.
When you run <code>devtools::release()</code> it asks you a bunch of standard
questions. It’s possible to add extra questions according to the
<a href="https://devtools.r-lib.org/reference/release.html">documentation</a>.
So, I’ve added the following un-exported function to the package:</p>
<div class="highlight"><pre><span></span><code><span class="n">release_questions</span> <span class="o"><-</span> <span class="nf">function</span><span class="p">()</span> <span class="p">{</span>
<span class="nf">c</span><span class="p">(</span>
<span class="s">"Did you re-build the hk_ext.Rmd using `rebuild-long-running-vignette.R`?"</span>
<span class="p">)</span>
<span class="p">}</span>
</code></pre></div>Calculating Extended Hanson—Koopmans Tolerance Limits2021-06-12T00:00:00-04:002021-06-12T00:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2021-06-12:/2021/06/calculating-ext-hk-tolerance-limits/<p>Calculating tolerance limits — such as A-Basis and B-Basis is an
important part of developing and certifying composite structure for
aircraft. When the data doesn’t fit a convenient parametric distribution
like a Normal or Weibull distribution, one often resorts to
non-parametric methods. Several non-parametric methods exist for
determining tolerance limits …</p><p>Calculating tolerance limits — such as A-Basis and B-Basis is an
important part of developing and certifying composite structure for
aircraft. When the data doesn’t fit a convenient parametric distribution
like a Normal or Weibull distribution, one often resorts to
non-parametric methods. Several non-parametric methods exist for
determining tolerance limits.</p>
<p>Vangel’s 1994 paper <a href='#Vangel1994' id='ref-Vangel1994-1'>
Vangel (1994)
</a> discusses a non-parametric method for
determining tolerance limits. This article provides a brief summary of
that work and discusses the implementation of that method in the R
language, as well as some choices that can be made in the implementation.</p>
<p>This method of calculating non-parametric tolerance limits is an
extension of the Hanson—Koopmans method <a href='#Hanson1964' id='ref-Hanson1964-1'>
Hanson and Koopmans (1964)
</a>. The lower
tolerance can be calculated using the following formula:</p>
<p>$$
T_L = x_{(j)}\left[\frac{x_{(i)}}{x_{(j)}}\right]^z $$</p>
<p>where $x_{(i)}$ and $x_{(j)}$ indicate the $i$th and $j$th order
statistic of the sample (that is, the $i$th smallest and the $j$th
smallest value).</p>
<p>The values of $j$ and $z$ need to be determined somehow.</p>
<p>There is a function $H(z)$ defined as follows:</p>
<p>$$
H(z) = Pr\left[T(z) \ge \log(1 - \beta)\right] $$</p>
<p>where $\beta$ is the content of the desired tolerance limit. The details
are outside the scope of this article, but we can write a function that
solves the following equation for $z$.</p>
<p>$$
H(z) = \gamma $$</p>
<p>where $\gamma$ is the confidence of the desired tolerance limit.</p>
<p>It turns out that we obtain different values of $z$ depending on which
values of $i$ and $j$ we choose.</p>
<p>Vangel’s approach is to set $i=1$ in all cases, then to find the value
of $j$ that would produce a tolerance limit that is nearest to the
population quantile assuming that the data is distributed according to a
standard normal distribution.</p>
<p>We’ll investigate this approach through simulation. First, we’ll load a
few packages.</p>
<div class="highlight"><pre><span></span><code><span class="nf">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span>
<span class="nf">library</span><span class="p">(</span><span class="n">cmstatr</span><span class="p">)</span>
</code></pre></div>
<p>Next, we’ll set the value of $i=1$ and a value of the content and
confidence of our tolerance limit. We’ll choose B-Basis tolerance limits
as an example.</p>
<div class="highlight"><pre><span></span><code><span class="n">i</span> <span class="o"><-</span> <span class="m">1</span>
<span class="n">p</span> <span class="o"><-</span> <span class="m">0.90</span>
<span class="n">conf</span> <span class="o"><-</span> <span class="m">0.95</span>
</code></pre></div>
<p>The expected value of the $i$th order statistic for a normally
distributed sample can be calculated using the following function (see
<a href='#Harter1961' id='ref-Harter1961-1'>
Harter (1961)
</a>). We’ll need this function soon.</p>
<div class="highlight"><pre><span></span><code><span class="n">expected_order_statistic</span> <span class="o"><-</span> <span class="nf">function</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
<span class="n">int</span> <span class="o"><-</span> <span class="nf">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="n">x</span> <span class="o">*</span> <span class="nf">pnorm</span><span class="p">(</span><span class="o">-</span><span class="n">x</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">i</span> <span class="o">-</span> <span class="m">1</span><span class="p">)</span> <span class="o">*</span> <span class="nf">pnorm</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">^</span> <span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="n">i</span><span class="p">)</span> <span class="o">*</span> <span class="nf">dnorm</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">integral</span> <span class="o"><-</span> <span class="nf">integrate</span><span class="p">(</span><span class="n">int</span><span class="p">,</span> <span class="o">-</span><span class="kc">Inf</span><span class="p">,</span> <span class="kc">Inf</span><span class="p">)</span>
<span class="nf">stopifnot</span><span class="p">(</span><span class="n">integral</span><span class="o">$</span><span class="n">message</span> <span class="o">==</span> <span class="s">"OK"</span><span class="p">)</span>
<span class="nf">factorial</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="nf">factorial</span><span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="n">i</span><span class="p">)</span> <span class="o">*</span> <span class="nf">factorial</span><span class="p">(</span><span class="n">i</span> <span class="o">-</span> <span class="m">1</span><span class="p">))</span> <span class="o">*</span> <span class="n">integral</span><span class="o">$</span><span class="n">value</span>
<span class="p">}</span>
</code></pre></div>
<p>When using Vangel’s approach, we need to minimize the value of the
following function.</p>
<div class="highlight"><pre><span></span><code><span class="n">fcn</span> <span class="o"><-</span> <span class="nf">function</span><span class="p">(</span><span class="n">j</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span> <span class="p">{</span>
<span class="n">e1</span> <span class="o"><-</span> <span class="nf">expected_order_statistic</span><span class="p">(</span><span class="n">i</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
<span class="n">e2</span> <span class="o"><-</span> <span class="nf">expected_order_statistic</span><span class="p">(</span><span class="n">j</span><span class="p">,</span> <span class="n">n</span><span class="p">)</span>
<span class="n">z</span> <span class="o"><-</span> <span class="nf">hk_ext_z</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">conf</span><span class="p">)</span>
<span class="nf">abs</span><span class="p">(</span><span class="n">z</span> <span class="o">*</span> <span class="n">e1</span> <span class="o">+</span> <span class="p">(</span><span class="m">1</span> <span class="o">-</span> <span class="n">z</span><span class="p">)</span> <span class="o">*</span> <span class="n">e2</span> <span class="o">-</span> <span class="nf">qnorm</span><span class="p">(</span><span class="n">p</span><span class="p">))</span>
<span class="p">}</span>
</code></pre></div>
<p>We can plot the above function versus $j$ for the value of $n=17$:</p>
<div class="highlight"><pre><span></span><code><span class="nf">data.frame</span><span class="p">(</span>
<span class="n">j</span> <span class="o">=</span> <span class="nf">seq</span><span class="p">(</span><span class="m">7</span><span class="p">,</span> <span class="m">11</span><span class="p">,</span> <span class="n">by</span> <span class="o">=</span> <span class="m">0.1</span><span class="p">)</span>
<span class="p">)</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">fcn</span> <span class="o">=</span> <span class="nf">Vectorize</span><span class="p">(</span><span class="n">fcn</span><span class="p">)(</span><span class="n">j</span><span class="p">,</span> <span class="m">17</span><span class="p">))</span> <span class="o">%>%</span>
<span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">j</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">fcn</span><span class="p">))</span> <span class="o">+</span>
<span class="nf">geom_line</span><span class="p">()</span> <span class="o">+</span>
<span class="nf">geom_point</span><span class="p">(</span>
<span class="n">data</span> <span class="o">=</span> <span class="nf">data.frame</span><span class="p">(</span><span class="n">j</span> <span class="o">=</span> <span class="m">7</span><span class="o">:</span><span class="m">11</span><span class="p">)</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">fcn</span> <span class="o">=</span> <span class="nf">Vectorize</span><span class="p">(</span><span class="n">fcn</span><span class="p">)(</span><span class="n">j</span><span class="p">,</span> <span class="m">17</span><span class="p">)),</span>
<span class="n">mapping</span> <span class="o">=</span> <span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">j</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">fcn</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<p><img alt="unnamed-chunk-5-1" src="https://www.kloppenborg.ca/2021/06/calculating-ext-hk-tolerance-limits/Calculating-Ext-HK-Tolerance-Limits_files/figure-markdown/unnamed-chunk-5-1.png"></p>
<p>In this particular case, we can see that $j=9$ produces the minimum
value of this function (for integer values of $j$). But this function at
$j=8$ is not much worse.</p>
<p>Of note, there is a table of optimum values of $j$ for various values of
$n$ published in <span class="caps">CMH</span>-17-1G [@<span class="caps">CMH</span>-17-1G] <sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>. For most values of $n$,
the optimum value from the function above matches the published value.
However, for samples of size 17, 20, 23, 24 and 28, the function above
disagrees with the published values by one unit. We will focus the
simulation effort on samples of these sizes. For sample sizes of
interest, the following values of $j$ and $z$ are published in
<span class="caps">CMH</span>-17-1G.</p>
<div class="highlight"><pre><span></span><code><span class="n">published_r_n</span> <span class="o"><-</span> <span class="nf">tribble</span><span class="p">(</span>
<span class="o">~</span><span class="n">n</span><span class="p">,</span> <span class="o">~</span><span class="n">j_pub</span><span class="p">,</span> <span class="o">~</span><span class="n">z_pub</span><span class="p">,</span>
<span class="m">17</span><span class="p">,</span> <span class="m">8</span><span class="p">,</span> <span class="m">1.434</span><span class="p">,</span>
<span class="m">20</span><span class="p">,</span> <span class="m">10</span><span class="p">,</span> <span class="m">1.253</span><span class="p">,</span>
<span class="m">23</span><span class="p">,</span> <span class="m">11</span><span class="p">,</span> <span class="m">1.143</span><span class="p">,</span>
<span class="m">24</span><span class="p">,</span> <span class="m">11</span><span class="p">,</span> <span class="m">1.114</span><span class="p">,</span>
<span class="m">28</span><span class="p">,</span> <span class="m">12</span><span class="p">,</span> <span class="m">1.010</span>
<span class="p">)</span>
</code></pre></div>
<p>We can create an R function that returns the “optimum” value of $j$
where all the integer values of $j$ are considered, then the integer
with the lowest value of that function is returned. Such an R function
is as follows:</p>
<div class="highlight"><pre><span></span><code><span class="n">optim_j</span> <span class="o"><-</span> <span class="nf">function</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="p">{</span>
<span class="n">j</span> <span class="o"><-</span> <span class="m">2</span><span class="o">:</span><span class="n">n</span>
<span class="n">f</span> <span class="o"><-</span> <span class="nf">sapply</span><span class="p">(</span><span class="m">2</span><span class="o">:</span><span class="n">n</span><span class="p">,</span> <span class="nf">function</span><span class="p">(</span><span class="n">j</span><span class="p">)</span> <span class="nf">Vectorize</span><span class="p">(</span><span class="n">fcn</span><span class="p">)(</span><span class="n">j</span><span class="p">,</span> <span class="n">n</span><span class="p">))</span>
<span class="n">j</span><span class="p">[</span><span class="n">f</span> <span class="o">==</span> <span class="nf">min</span><span class="p">(</span><span class="n">f</span><span class="p">)]</span>
<span class="p">}</span>
</code></pre></div>
<p>For values of $n$ of interest, we’ll generate a large number of samples
(10,000) drawn from a normal distribution. We can calculate the true
population quantile, since we know the population parameters. We can use
the two variations of the nonparametric tolerance limit approach to
calculate tolerance limits. The proportion of those tolerance limits
that are below the population quantile should equal the selected
confidence level. We’ll restrict the simulation to values of $n$ where
we find different values of $j$ compared with those publised in
<span class="caps">CMH</span>-17-1G.</p>
<div class="highlight"><pre><span></span><code><span class="n">mu_normal</span> <span class="o"><-</span> <span class="m">100</span>
<span class="n">sd_normal</span> <span class="o"><-</span> <span class="m">6</span>
<span class="nf">set.seed</span><span class="p">(</span><span class="m">1234567</span><span class="p">)</span> <span class="c1"># make this reproducible</span>
<span class="n">sim_normal</span> <span class="o"><-</span> <span class="nf">pmap_dfr</span><span class="p">(</span><span class="n">published_r_n</span><span class="p">,</span> <span class="nf">function</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">j_pub</span><span class="p">,</span> <span class="n">z_pub</span><span class="p">)</span> <span class="p">{</span>
<span class="n">j_opt</span> <span class="o"><-</span> <span class="nf">optim_j</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="n">z_opt</span> <span class="o"><-</span> <span class="nf">hk_ext_z</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j_opt</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">conf</span><span class="p">)</span>
<span class="nf">map_dfr</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">10000</span><span class="p">,</span> <span class="nf">function</span><span class="p">(</span><span class="n">i_sim</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">tibble</span><span class="p">(</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">n</span><span class="p">,</span>
<span class="n">x</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="nf">sort</span><span class="p">(</span><span class="nf">rnorm</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">mu_normal</span><span class="p">,</span> <span class="n">sd_normal</span><span class="p">))),</span>
<span class="n">j_pub</span> <span class="o">=</span> <span class="n">j_pub</span><span class="p">,</span>
<span class="n">j_opt</span> <span class="o">=</span> <span class="n">j_opt</span><span class="p">,</span>
<span class="n">z_pub</span> <span class="o">=</span> <span class="n">z_pub</span><span class="p">,</span>
<span class="n">z_opt</span> <span class="o">=</span> <span class="n">z_opt</span><span class="p">,</span>
<span class="p">)</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="p">})</span> <span class="o">%>%</span>
<span class="nf">rowwise</span><span class="p">()</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span>
<span class="n">T_pub</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="n">j_pub</span><span class="p">]</span> <span class="o">*</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">/</span> <span class="n">x</span><span class="p">[</span><span class="n">j_pub</span><span class="p">])</span> <span class="o">^</span> <span class="n">z_pub</span><span class="p">,</span>
<span class="n">T_opt</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="n">j_opt</span><span class="p">]</span> <span class="o">*</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">/</span> <span class="n">x</span><span class="p">[</span><span class="n">j_opt</span><span class="p">])</span> <span class="o">^</span> <span class="n">z_opt</span>
<span class="p">)</span>
<span class="n">sim_normal</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## # A tibble: 50,000 × 8
## # Rowwise:
## n x j_pub j_opt z_pub z_opt T_pub T_opt
## <dbl> <list> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 17 <dbl [17]> 8 9 1.43 1.40 85.4 85.4
## 2 17 <dbl [17]> 8 9 1.43 1.40 89.5 89.3
## 3 17 <dbl [17]> 8 9 1.43 1.40 83.2 83.5
## 4 17 <dbl [17]> 8 9 1.43 1.40 83.6 83.8
## 5 17 <dbl [17]> 8 9 1.43 1.40 83.4 83.8
## 6 17 <dbl [17]> 8 9 1.43 1.40 84.1 84.4
## 7 17 <dbl [17]> 8 9 1.43 1.40 82.6 82.9
## 8 17 <dbl [17]> 8 9 1.43 1.40 87.5 87.6
## 9 17 <dbl [17]> 8 9 1.43 1.40 83.9 83.8
## 10 17 <dbl [17]> 8 9 1.43 1.40 86.9 87.2
## # … with 49,990 more rows
</code></pre></div>
<p>We can plot the distribution of the tolerance limits that result from
our R code and from the values of $j$ and $z$ published in <span class="caps">CMH</span>-17-1G. We
see that the distributions are very similar.</p>
<div class="highlight"><pre><span></span><code><span class="n">sim_normal</span> <span class="o">%>%</span>
<span class="nf">pivot_longer</span><span class="p">(</span><span class="n">cols</span> <span class="o">=</span> <span class="n">T_pub</span><span class="o">:</span><span class="n">T_opt</span><span class="p">,</span> <span class="n">names_to</span> <span class="o">=</span> <span class="s">"Approach"</span><span class="p">)</span> <span class="o">%>%</span>
<span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">value</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="n">Approach</span><span class="p">))</span> <span class="o">+</span>
<span class="nf">geom_density</span><span class="p">()</span> <span class="o">+</span>
<span class="nf">facet_wrap</span><span class="p">(</span><span class="n">n</span> <span class="o">~</span> <span class="n">.</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">ggtitle</span><span class="p">(</span><span class="s">"Distribution of Tolerance Limits for Various Values of n"</span><span class="p">)</span>
</code></pre></div>
<p><img alt="unnamed-chunk-9-1" src="https://www.kloppenborg.ca/2021/06/calculating-ext-hk-tolerance-limits/Calculating-Ext-HK-Tolerance-Limits_files/figure-markdown/unnamed-chunk-9-1.png"></p>
<p>In this article, we’re calculating the B-Basis (lower 90/95 tolerance
limit). So, the population quantile that we’re approximating is:</p>
<div class="highlight"><pre><span></span><code><span class="n">x_p_normal</span> <span class="o"><-</span> <span class="nf">qnorm</span><span class="p">(</span><span class="m">1</span> <span class="o">-</span> <span class="n">p</span><span class="p">,</span> <span class="n">mu_normal</span><span class="p">,</span> <span class="n">sd_normal</span><span class="p">)</span>
<span class="n">x_p_normal</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## [1] 92.31069
</code></pre></div>
<p>We can now determine what proportion of the calculated tolerance limits
were below the population quantile.</p>
<div class="highlight"><pre><span></span><code><span class="n">sim_normal</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">below_pub</span> <span class="o">=</span> <span class="n">T_pub</span> <span class="o"><</span> <span class="n">x_p_normal</span><span class="p">,</span>
<span class="n">below_opt</span> <span class="o">=</span> <span class="n">T_opt</span> <span class="o"><</span> <span class="n">x_p_normal</span><span class="p">)</span> <span class="o">%>%</span>
<span class="nf">group_by</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">%>%</span>
<span class="nf">summarise</span><span class="p">(</span>
<span class="n">prop_below_pub</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span><span class="n">below_pub</span><span class="p">)</span> <span class="o">/</span> <span class="nf">n</span><span class="p">(),</span>
<span class="n">prop_below_opt</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span><span class="n">below_opt</span><span class="p">)</span> <span class="o">/</span> <span class="nf">n</span><span class="p">()</span>
<span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## # A tibble: 5 × 3
## n prop_below_pub prop_below_opt
## <dbl> <dbl> <dbl>
## 1 17 0.964 0.967
## 2 20 0.967 0.965
## 3 23 0.960 0.960
## 4 24 0.959 0.957
## 5 28 0.954 0.954
</code></pre></div>
<p>In all cases, the tolerance limits are conservative when the data are
normally distributed. Remember that we expect that 95% of the tolerance
limits should be below the population quantile: here we see a slightly
higher proportion than 95%.</p>
<p>We can repeat this with a distribution that is far from normal. Let’s
try it withe the $\chi^2$ distribution.</p>
<div class="highlight"><pre><span></span><code><span class="n">df_chisq</span> <span class="o"><-</span> <span class="m">6</span>
<span class="nf">set.seed</span><span class="p">(</span><span class="m">2345678</span><span class="p">)</span> <span class="c1"># make this reproducible</span>
<span class="n">sim_chisq</span> <span class="o"><-</span> <span class="nf">pmap_dfr</span><span class="p">(</span><span class="n">published_r_n</span><span class="p">,</span> <span class="nf">function</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">j_pub</span><span class="p">,</span> <span class="n">z_pub</span><span class="p">)</span> <span class="p">{</span>
<span class="n">j_opt</span> <span class="o"><-</span> <span class="nf">optim_j</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="n">z_opt</span> <span class="o"><-</span> <span class="nf">hk_ext_z</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j_opt</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">conf</span><span class="p">)</span>
<span class="nf">map_dfr</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">10000</span><span class="p">,</span> <span class="nf">function</span><span class="p">(</span><span class="n">i_sim</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">tibble</span><span class="p">(</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">n</span><span class="p">,</span>
<span class="n">x</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="nf">sort</span><span class="p">(</span><span class="nf">rchisq</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">df_chisq</span><span class="p">))),</span>
<span class="n">j_pub</span> <span class="o">=</span> <span class="n">j_pub</span><span class="p">,</span>
<span class="n">j_opt</span> <span class="o">=</span> <span class="n">j_opt</span><span class="p">,</span>
<span class="n">z_pub</span> <span class="o">=</span> <span class="n">z_pub</span><span class="p">,</span>
<span class="n">z_opt</span> <span class="o">=</span> <span class="n">z_opt</span><span class="p">,</span>
<span class="p">)</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="p">})</span> <span class="o">%>%</span>
<span class="nf">rowwise</span><span class="p">()</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span>
<span class="n">T_pub</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="n">j_pub</span><span class="p">]</span> <span class="o">*</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">/</span> <span class="n">x</span><span class="p">[</span><span class="n">j_pub</span><span class="p">])</span> <span class="o">^</span> <span class="n">z_pub</span><span class="p">,</span>
<span class="n">T_opt</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="n">j_opt</span><span class="p">]</span> <span class="o">*</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">/</span> <span class="n">x</span><span class="p">[</span><span class="n">j_opt</span><span class="p">])</span> <span class="o">^</span> <span class="n">z_opt</span>
<span class="p">)</span>
<span class="n">sim_chisq</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## # A tibble: 50,000 × 8
## # Rowwise:
## n x j_pub j_opt z_pub z_opt T_pub T_opt
## <dbl> <list> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 17 <dbl [17]> 8 9 1.43 1.40 1.00 1.03
## 2 17 <dbl [17]> 8 9 1.43 1.40 1.35 1.34
## 3 17 <dbl [17]> 8 9 1.43 1.40 1.39 1.40
## 4 17 <dbl [17]> 8 9 1.43 1.40 1.39 1.28
## 5 17 <dbl [17]> 8 9 1.43 1.40 1.39 1.43
## 6 17 <dbl [17]> 8 9 1.43 1.40 0.283 0.297
## 7 17 <dbl [17]> 8 9 1.43 1.40 0.514 0.497
## 8 17 <dbl [17]> 8 9 1.43 1.40 0.264 0.268
## 9 17 <dbl [17]> 8 9 1.43 1.40 1.68 1.61
## 10 17 <dbl [17]> 8 9 1.43 1.40 0.661 0.692
## # … with 49,990 more rows
</code></pre></div>
<p>The population quantile is:</p>
<div class="highlight"><pre><span></span><code><span class="n">x_p_chisq</span> <span class="o"><-</span> <span class="nf">qchisq</span><span class="p">(</span><span class="m">1</span> <span class="o">-</span> <span class="n">p</span><span class="p">,</span> <span class="n">df_chisq</span><span class="p">)</span>
<span class="n">x_p_chisq</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## [1] 2.204131
</code></pre></div>
<p>The distribution of the tolerance limits calculated using the values of
$j$ and $z$ that we calculate and those published. Again, the
distributions are very similar.</p>
<div class="highlight"><pre><span></span><code><span class="n">sim_chisq</span> <span class="o">%>%</span>
<span class="nf">pivot_longer</span><span class="p">(</span><span class="n">cols</span> <span class="o">=</span> <span class="n">T_pub</span><span class="o">:</span><span class="n">T_opt</span><span class="p">,</span> <span class="n">names_to</span> <span class="o">=</span> <span class="s">"Approach"</span><span class="p">)</span> <span class="o">%>%</span>
<span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">value</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="n">Approach</span><span class="p">))</span> <span class="o">+</span>
<span class="nf">geom_density</span><span class="p">()</span> <span class="o">+</span>
<span class="nf">facet_wrap</span><span class="p">(</span><span class="n">n</span> <span class="o">~</span> <span class="n">.</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">ggtitle</span><span class="p">(</span><span class="s">"Distribution of Tolerance Limits for Various Values of n"</span><span class="p">)</span>
</code></pre></div>
<p><img alt="unnamed-chunk-14-1" src="https://www.kloppenborg.ca/2021/06/calculating-ext-hk-tolerance-limits/Calculating-Ext-HK-Tolerance-Limits_files/figure-markdown/unnamed-chunk-14-1.png"></p>
<p>We can now determine what proportion of the calculated tolerance limits
were below the population quantile.</p>
<div class="highlight"><pre><span></span><code><span class="n">sim_chisq</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">below_pub</span> <span class="o">=</span> <span class="n">T_pub</span> <span class="o"><</span> <span class="n">x_p_chisq</span><span class="p">,</span>
<span class="n">below_opt</span> <span class="o">=</span> <span class="n">T_opt</span> <span class="o"><</span> <span class="n">x_p_chisq</span><span class="p">)</span> <span class="o">%>%</span>
<span class="nf">group_by</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">%>%</span>
<span class="nf">summarise</span><span class="p">(</span>
<span class="n">prop_below_pub</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span><span class="n">below_pub</span><span class="p">)</span> <span class="o">/</span> <span class="nf">n</span><span class="p">(),</span>
<span class="n">prop_below_opt</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span><span class="n">below_opt</span><span class="p">)</span> <span class="o">/</span> <span class="nf">n</span><span class="p">()</span>
<span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## # A tibble: 5 × 3
## n prop_below_pub prop_below_opt
## <dbl> <dbl> <dbl>
## 1 17 0.963 0.965
## 2 20 0.959 0.959
## 3 23 0.959 0.958
## 4 24 0.955 0.955
## 5 28 0.953 0.953
</code></pre></div>
<p>Again with this distribution, we see that the tolerance limits are conservative.</p>
<p>Finally, let’s try again using a t-Distribution.</p>
<div class="highlight"><pre><span></span><code><span class="n">df_t</span> <span class="o"><-</span> <span class="m">3</span>
<span class="n">offset_t</span> <span class="o"><-</span> <span class="m">150</span>
<span class="nf">set.seed</span><span class="p">(</span><span class="m">4567</span><span class="p">)</span> <span class="c1"># make this reproducible</span>
<span class="n">sim_t</span> <span class="o"><-</span> <span class="nf">pmap_dfr</span><span class="p">(</span><span class="n">published_r_n</span><span class="p">,</span> <span class="nf">function</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">j_pub</span><span class="p">,</span> <span class="n">z_pub</span><span class="p">)</span> <span class="p">{</span>
<span class="n">j_opt</span> <span class="o"><-</span> <span class="nf">optim_j</span><span class="p">(</span><span class="n">n</span><span class="p">)</span>
<span class="n">z_opt</span> <span class="o"><-</span> <span class="nf">hk_ext_z</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">j_opt</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">conf</span><span class="p">)</span>
<span class="nf">map_dfr</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">10000</span><span class="p">,</span> <span class="nf">function</span><span class="p">(</span><span class="n">i_sim</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">tibble</span><span class="p">(</span>
<span class="n">n</span> <span class="o">=</span> <span class="n">n</span><span class="p">,</span>
<span class="n">x</span> <span class="o">=</span> <span class="nf">list</span><span class="p">(</span><span class="nf">sort</span><span class="p">(</span><span class="nf">rt</span><span class="p">(</span><span class="n">n</span><span class="p">,</span> <span class="n">df_t</span><span class="p">)</span> <span class="o">+</span> <span class="n">offset_t</span><span class="p">)),</span>
<span class="n">j_pub</span> <span class="o">=</span> <span class="n">j_pub</span><span class="p">,</span>
<span class="n">j_opt</span> <span class="o">=</span> <span class="n">j_opt</span><span class="p">,</span>
<span class="n">z_pub</span> <span class="o">=</span> <span class="n">z_pub</span><span class="p">,</span>
<span class="n">z_opt</span> <span class="o">=</span> <span class="n">z_opt</span><span class="p">,</span>
<span class="p">)</span>
<span class="p">}</span>
<span class="p">)</span>
<span class="p">})</span> <span class="o">%>%</span>
<span class="nf">rowwise</span><span class="p">()</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span>
<span class="n">T_pub</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="n">j_pub</span><span class="p">]</span> <span class="o">*</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">/</span> <span class="n">x</span><span class="p">[</span><span class="n">j_pub</span><span class="p">])</span> <span class="o">^</span> <span class="n">z_pub</span><span class="p">,</span>
<span class="n">T_opt</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="n">j_opt</span><span class="p">]</span> <span class="o">*</span> <span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">/</span> <span class="n">x</span><span class="p">[</span><span class="n">j_opt</span><span class="p">])</span> <span class="o">^</span> <span class="n">z_opt</span>
<span class="p">)</span>
<span class="n">sim_t</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## # A tibble: 50,000 × 8
## # Rowwise:
## n x j_pub j_opt z_pub z_opt T_pub T_opt
## <dbl> <list> <dbl> <int> <dbl> <dbl> <dbl> <dbl>
## 1 17 <dbl [17]> 8 9 1.43 1.40 140. 140.
## 2 17 <dbl [17]> 8 9 1.43 1.40 147. 147.
## 3 17 <dbl [17]> 8 9 1.43 1.40 144. 144.
## 4 17 <dbl [17]> 8 9 1.43 1.40 147. 147.
## 5 17 <dbl [17]> 8 9 1.43 1.40 147. 147.
## 6 17 <dbl [17]> 8 9 1.43 1.40 145. 145.
## 7 17 <dbl [17]> 8 9 1.43 1.40 146. 146.
## 8 17 <dbl [17]> 8 9 1.43 1.40 147. 147.
## 9 17 <dbl [17]> 8 9 1.43 1.40 146. 146.
## 10 17 <dbl [17]> 8 9 1.43 1.40 143. 143.
## # … with 49,990 more rows
</code></pre></div>
<p>The population quantile is:</p>
<div class="highlight"><pre><span></span><code><span class="n">x_p_t</span> <span class="o"><-</span> <span class="nf">qt</span><span class="p">(</span><span class="m">1</span> <span class="o">-</span> <span class="n">p</span><span class="p">,</span> <span class="n">df_t</span><span class="p">)</span> <span class="o">+</span> <span class="n">offset_t</span>
<span class="n">x_p_t</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## [1] 148.3623
</code></pre></div>
<p>The distribution of the tolerance limits using the two approaches are as
follows. Again, the distributions are very similar.</p>
<div class="highlight"><pre><span></span><code><span class="n">sim_t</span> <span class="o">%>%</span>
<span class="nf">pivot_longer</span><span class="p">(</span><span class="n">cols</span> <span class="o">=</span> <span class="n">T_pub</span><span class="o">:</span><span class="n">T_opt</span><span class="p">,</span> <span class="n">names_to</span> <span class="o">=</span> <span class="s">"Approach"</span><span class="p">)</span> <span class="o">%>%</span>
<span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">value</span><span class="p">,</span> <span class="n">color</span> <span class="o">=</span> <span class="n">Approach</span><span class="p">))</span> <span class="o">+</span>
<span class="nf">geom_density</span><span class="p">()</span> <span class="o">+</span>
<span class="nf">facet_wrap</span><span class="p">(</span><span class="n">n</span> <span class="o">~</span> <span class="n">.</span><span class="p">)</span> <span class="o">+</span>
<span class="nf">ggtitle</span><span class="p">(</span><span class="s">"Distribution of Tolerance Limits for Various Values of n"</span><span class="p">)</span>
</code></pre></div>
<p><img alt="unnamed-chunk-18-1" src="https://www.kloppenborg.ca/2021/06/calculating-ext-hk-tolerance-limits/Calculating-Ext-HK-Tolerance-Limits_files/figure-markdown/unnamed-chunk-18-1.png"></p>
<p>We can now determine what proportion of the calculated tolerance limits
were below the population quantile.</p>
<div class="highlight"><pre><span></span><code><span class="n">sim_t</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">below_pub</span> <span class="o">=</span> <span class="n">T_pub</span> <span class="o"><</span> <span class="n">x_p_t</span><span class="p">,</span>
<span class="n">below_opt</span> <span class="o">=</span> <span class="n">T_opt</span> <span class="o"><</span> <span class="n">x_p_t</span><span class="p">)</span> <span class="o">%>%</span>
<span class="nf">group_by</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="o">%>%</span>
<span class="nf">summarise</span><span class="p">(</span>
<span class="n">prop_below_pub</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span><span class="n">below_pub</span><span class="p">)</span> <span class="o">/</span> <span class="nf">n</span><span class="p">(),</span>
<span class="n">prop_below_opt</span> <span class="o">=</span> <span class="nf">sum</span><span class="p">(</span><span class="n">below_opt</span><span class="p">)</span> <span class="o">/</span> <span class="nf">n</span><span class="p">()</span>
<span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## # A tibble: 5 × 3
## n prop_below_pub prop_below_opt
## <dbl> <dbl> <dbl>
## 1 17 0.958 0.959
## 2 20 0.953 0.952
## 3 23 0.953 0.953
## 4 24 0.954 0.954
## 5 28 0.953 0.953
</code></pre></div>
<p>For this distribution, the tolerance limits are still conservative.</p>
<p>From this simulation work, it appears that both approaches to selecting
the value of $j$ preform equally well. The tolerance limits produced
using each approach for a particular sample will be different, but both
approaches seem to be equally valid.</p>
<p>The R package <code>cmstatr</code> contains the function <code>hk_ext_z_j_opt</code> which
returns $j$ and $z$ for calculating tolerance limits with the
optimization method described here (after version 0.8.0<sup id="fnref:2"><a class="footnote-ref" href="#fn:2">2</a></sup>). While the
tolerance limits found for some particular samples may differ slightly
from that produced by the tables published in <span class="caps">CMH</span>-17-1G, both results
appear are equally valid.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>It should be noted that <span class="caps">CMH</span>-17-1G uses $r$ and $k$ instead of $j$
and $z$ as used in this article and in Vangel’s paper. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
<li id="fn:2">
<p><code>cmstatr</code> version 0.8.0 and earlier used a slightly different
function that was optimized. That version of the code produces
slightly different values of $j$ for certain values of $n$. <a class="footnote-backref" href="#fnref:2" title="Jump back to footnote 2 in the text">↩</a></p>
</li>
</ol>
</div>Basis Values From Censored Data2021-02-09T00:00:00-05:002021-02-09T00:00:00-05:00Stefan Kloppenborgtag:www.kloppenborg.ca,2021-02-09:/2021/02/basis-values-censored-data/<p>Earlier, I wrote a post about using a
<a href="https://www.kloppenborg.ca/2021/02/likelihood-basis-values/">likelihood-based approach to calculating Basis values</a>.
In that post, I hinted that likelihood-based approaches can be useful when
dealing with censored data.</p>
<p>First of all, what does censoring mean? It means that the value reported
is either artificially high or artificially low …</p><p>Earlier, I wrote a post about using a
<a href="https://www.kloppenborg.ca/2021/02/likelihood-basis-values/">likelihood-based approach to calculating Basis values</a>.
In that post, I hinted that likelihood-based approaches can be useful when
dealing with censored data.</p>
<p>First of all, what does censoring mean? It means that the value reported
is either artificially high or artificially low. There are a few reasons
that this could happen. It happens often with lifetime data: with fatigue
tests, you set a number of cycles at which the specimen “runs out” and you
stop the test; with studies of mortality, some of the subjects will still
be alive when you do the analysis. In these cases, the true value is greater
than the observed result, but you don’t know by how much.
These are examples of <em>right-censored</em> data.</p>
<p>Data can also be <em>left-censored</em>, meaning that the true value is less than
the observed value. This can happen if some of the values are too small to
be measured. Perhaps the instrument that you’re using can’t detect values below
a certain amount.</p>
<p>There is also <em>interval-censored</em> data. This often occurs in survey data.
For example, you might have data for individuals aged 40-44, but you don’t
know where they fall within that range.</p>
<p>In this post, we’re going to deal with <em>right-censored</em> data.</p>
<p>At my day job, I often deal with data from testing of metallic inserts
installed in honeycomb sandwich panel. These metallic inserts have a hole
in their centers that will accept a screw. Their purpose is to allow a screw
to be fastened to the panel, and the strength of this connections is one
of the important considerations.</p>
<p>We determine the strength of the insert through testing. The usual test coupon that we use has
two of these inserts installed, and we pull them away from each other to
measure the shear strength. This
is a convenient way of applying the load, but I’ve long thought that it must
give low results. The loading of the coupon looks like this:</p>
<p><img alt="Coupon" src="https://www.kloppenborg.ca/2021/02/basis-values-censored-data/basis-values-censored-data_files/figure-markdown/coupon.png"></p>
<p>The reason that I’ve thought that this test method will give artificially low
results is the fact that there are two inserts. The test ends when either
one of these two insert fails: the other insert must be stronger than the one
that failed first.</p>
<p>To illustrate this, let’s do a slightly silly thought experiment.
Let’s imagine that we’re making a set of these coupons. We decide that
we’re going to install one insert in each coupon first, then come back
tomorrow and install the other insert in each coupon. Tomorrow comes
around, and we decide to let the brand new intern install the second
insert. The intern hasn’t yet been fully trained, and they accidentally
install the wrong type of insert in teh second hole, but unfortunately
they look identical to the correct type. The correct type of insert has
a strength that is always $1000 lbf$, but we don’t know that yet.
The wrong type of insert always has a strength of exactly $500 lbf$.
When we do our tests, all of the coupons fail on the side that the
intern installed (the wrong insert) and the strength of each coupon
is $500 lbf$. We conclude that the mean strength of these inserts
is $500 lbf$ with a very low variance.</p>
<p>But, we’d be wrong.</p>
<p>In this thought experiment, the actual mean strength of the inserts
(considering both the correct and incorrect types of inserts) is
$750 lbf$ and there’s actually a pretty high variance. We were simply
unable to observe the strength of the stronger screws because of <em>censoring</em>.</p>
<p>In a more realistic case, we’re actually going to be dealing with parts that
have strengths drawn from the same continuous distribution. As we move on,
we’re going to assume that the strength of each individual insert is a
random variable drawn from the same continuous distribution (that is, they
are <span class="caps">IID</span>).</p>
<p>Let’s create some simulated data. We’ll start by loading a few R packages
that we’ll need.</p>
<div class="highlight"><pre><span></span><code><span class="nf">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span>
<span class="nf">library</span><span class="p">(</span><span class="n">cmstatr</span><span class="p">)</span>
<span class="nf">library</span><span class="p">(</span><span class="n">stats4</span><span class="p">)</span>
</code></pre></div>
<p>Next, we’ll create a sample of $40$ simulated insert strengths. These will be
drawn from a normal distribution with a mean of $1000$ and a standard
deviation of $100$.</p>
<div class="highlight"><pre><span></span><code><span class="n">pop_mean</span> <span class="o"><-</span> <span class="m">1000</span>
<span class="n">pop_sd</span> <span class="o"><-</span> <span class="m">100</span>
<span class="nf">set.seed</span><span class="p">(</span><span class="m">123</span><span class="p">)</span> <span class="c1"># make this example reproducible</span>
<span class="n">strength</span> <span class="o"><-</span> <span class="nf">rnorm</span><span class="p">(</span><span class="m">40</span><span class="p">,</span> <span class="n">pop_mean</span><span class="p">,</span> <span class="n">pop_sd</span><span class="p">)</span>
</code></pre></div>
<p>Now let’s calculate the mean of this sample. We expect it to be
fairly close to 1000, and indeed it is.</p>
<div class="highlight"><pre><span></span><code><span class="nf">mean</span><span class="p">(</span><span class="n">strength</span><span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## [1] 1004.518
</code></pre></div>
<p>And we can also calculate the standard deviation:</p>
<div class="highlight"><pre><span></span><code><span class="nf">sd</span><span class="p">(</span><span class="n">strength</span><span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## [1] 89.77847
</code></pre></div>
<p>For the strength of most aircraft structures, we are concerned with
a lower tolerance bound of the strength. For multiple load-path structure,
we need to calculate the B-Basis strength, which is the $95/%$ lower
confidence bound on the 10-th percentile of the strength.</p>
<p>Since we know the actual strength of all 40 inserts, we can calculate
the B-Basis based on these actual insert strengths. Ideally, the
B-Basis value that we calculate later will be close to this value.</p>
<div class="highlight"><pre><span></span><code><span class="nf">basis_normal</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">strength</span><span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="c1">## `outliers_within_batch` not run because parameter `batch` not specified</span><span class="w"></span>
<span class="c1">## `between_batch_variability` not run because parameter `batch` not specified</span><span class="w"></span>
<span class="c1">## </span><span class="w"></span>
<span class="c1">## Call:</span><span class="w"></span>
<span class="c1">## basis_normal(x = strength)</span><span class="w"></span>
<span class="c1">## </span><span class="w"></span>
<span class="c1">## Distribution: Normal ( n = 40 )</span><span class="w"></span>
<span class="c1">## B-Basis: ( p = 0.9 , conf = 0.95 )</span><span class="w"></span>
<span class="c1">## 852.1482</span><span class="w"></span>
</code></pre></div>
<p>Now, we’ll take these $40$ insert strengths and put them into
$20$ coupons: each with two inserts. The observed coupon strength
will be set to the <em>lower</em> of the two inserts installed in
that coupon, because the coupon will fail as soon as either
one of the installed inserts fails.</p>
<div class="highlight"><pre><span></span><code><span class="n">dat</span> <span class="o"><-</span> <span class="nf">data.frame</span><span class="p">(</span>
<span class="n">ID</span> <span class="o">=</span> <span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">,</span>
<span class="n">strength1</span> <span class="o">=</span> <span class="n">strength</span><span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="m">20</span><span class="p">],</span>
<span class="n">strength2</span> <span class="o">=</span> <span class="n">strength</span><span class="p">[</span><span class="m">21</span><span class="o">:</span><span class="m">40</span><span class="p">]</span>
<span class="p">)</span> <span class="o">%>%</span>
<span class="nf">rowwise</span><span class="p">()</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">strength_observed</span> <span class="o">=</span> <span class="nf">min</span><span class="p">(</span><span class="n">strength1</span><span class="p">,</span> <span class="n">strength2</span><span class="p">))</span> <span class="o">%>%</span>
<span class="nf">ungroup</span><span class="p">()</span>
<span class="n">dat</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## # A tibble: 20 × 4
## ID strength1 strength2 strength_observed
## <int> <dbl> <dbl> <dbl>
## 1 1 944. 893. 893.
## 2 2 977. 978. 977.
## 3 3 1156. 897. 897.
## 4 4 1007. 927. 927.
## 5 5 1013. 937. 937.
## 6 6 1172. 831. 831.
## 7 7 1046. 1084. 1046.
## 8 8 873. 1015. 873.
## 9 9 931. 886. 886.
## 10 10 955. 1125. 955.
## 11 11 1122. 1043. 1043.
## 12 12 1036. 970. 970.
## 13 13 1040. 1090. 1040.
## 14 14 1011. 1088. 1011.
## 15 15 944. 1082. 944.
## 16 16 1179. 1069. 1069.
## 17 17 1050. 1055. 1050.
## 18 18 803. 994. 803.
## 19 19 1070. 969. 969.
## 20 20 953. 962. 953.
</code></pre></div>
<p>Let’s look at the summary statistics for this data:</p>
<div class="highlight"><pre><span></span><code><span class="n">dat</span> <span class="o">%>%</span>
<span class="nf">summarise</span><span class="p">(</span>
<span class="n">mean</span> <span class="o">=</span> <span class="nf">mean</span><span class="p">(</span><span class="n">strength_observed</span><span class="p">),</span>
<span class="n">sd</span> <span class="o">=</span> <span class="nf">sd</span><span class="p">(</span><span class="n">strength_observed</span><span class="p">),</span>
<span class="n">cv</span> <span class="o">=</span> <span class="nf">cv</span><span class="p">(</span><span class="n">strength_observed</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## # A tibble: 1 × 3
## mean sd cv
## <dbl> <dbl> <dbl>
## 1 954. 75.1 0.0788
</code></pre></div>
<p>Hmmm. We see the mean is much lower than the mean of the individual insert
strength. Remember that the mean insert strength was $1005$, but the mean
strength of the coupons is $954$.</p>
<p>Next, we’ll naively calculate a B-Basis value from the measured strength. We’ll
assume a normal distribution.</p>
<div class="highlight"><pre><span></span><code><span class="n">dat</span> <span class="o">%>%</span>
<span class="nf">basis_normal</span><span class="p">(</span><span class="n">strength_observed</span><span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>##
## Call:
## basis_normal(data = ., x = strength_observed)
##
## Distribution: Normal ( n = 20 )
## B-Basis: ( p = 0.9 , conf = 0.95 )
## 809.1911
</code></pre></div>
<p>We’ll just keep this number in mind for now and we’ll move on to
the idea of using a likelihood-based approach to calculate a
B-Basis value, considering the fact that this data is censored.</p>
<p>The way that this data is censored might not be immediately obvious.
But, each time we test one of these coupons, which contain two
inserts, we actually get two pieces of data. We get the strength
of one of the inserts. This is an <em>exact</em> value. But we also get
a second piece of data. We know that the strength of the other
insert is at least as high as the one that failed first. This
is a <em>right censored</em> value.</p>
<p>In the <a href="https://www.kloppenborg.ca/2021/02/likelihood-basis-values/">previous post</a>,
I gave an expression for the likelihood function. However, that
function only considers exact observations. The expression for
the likelihood, considering censored data as follows
(see <a href='#Meeker_Hahn_Escobar2017' id='ref-Meeker_Hahn_Escobar2017-1'>
Meeker et al. (2017)
</a>).</p>
<p>$$
\mathcal{L}\left(\theta\right) = \prod_{i=1}^{n}
\begin{cases}
f\left(X_i;\,\theta\right) <span class="amp">&</span> \mbox{if } X_i \mbox{ is exact} \
F\left(X_i;\,\theta\right) <span class="amp">&</span> \mbox{if } X_i \mbox{ is left censored} \
1 - F\left(X_i;\,\theta\right) <span class="amp">&</span> \mbox{if } X_i \mbox{ is right censored}
\end{cases} $$</p>
<p>Where $f()$ is the probability density function and $F()$ is the
cumulative density function.</p>
<p>We can implement a log-likelihood function based on this in R as follows:</p>
<div class="highlight"><pre><span></span><code><span class="n">log_likelihood_normal</span> <span class="o"><-</span> <span class="nf">function</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sig</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">censored</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">suppressWarnings</span><span class="p">(</span>
<span class="nf">sum</span><span class="p">(</span><span class="nf">map2_dbl</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">censored</span><span class="p">,</span> <span class="nf">function</span><span class="p">(</span><span class="n">xi</span><span class="p">,</span> <span class="n">ci</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">if </span><span class="p">(</span><span class="n">ci</span> <span class="o">==</span> <span class="s">"exact"</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">dnorm</span><span class="p">(</span><span class="n">xi</span><span class="p">,</span> <span class="n">mean</span> <span class="o">=</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sd</span> <span class="o">=</span> <span class="n">sig</span><span class="p">,</span> <span class="n">log</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
<span class="p">}</span> <span class="n">else</span> <span class="nf">if </span><span class="p">(</span><span class="n">ci</span> <span class="o">==</span> <span class="s">"left"</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">pnorm</span><span class="p">(</span><span class="n">xi</span><span class="p">,</span> <span class="n">mean</span> <span class="o">=</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sd</span> <span class="o">=</span> <span class="n">sig</span><span class="p">,</span> <span class="n">log.p</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
<span class="p">}</span> <span class="n">else</span> <span class="nf">if </span><span class="p">(</span><span class="n">ci</span> <span class="o">==</span> <span class="s">"right"</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">pnorm</span><span class="p">(</span><span class="n">xi</span><span class="p">,</span> <span class="n">mean</span> <span class="o">=</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sd</span> <span class="o">=</span> <span class="n">sig</span><span class="p">,</span> <span class="n">log.p</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">,</span>
<span class="n">lower.tail</span> <span class="o">=</span> <span class="kc">FALSE</span><span class="p">)</span>
<span class="p">}</span> <span class="n">else</span> <span class="p">{</span>
<span class="nf">stop</span><span class="p">(</span><span class="s">"Invalid value of `censored`"</span><span class="p">)</span>
<span class="p">}</span>
<span class="p">}))</span>
<span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>We can use this log-likelihood function to find the maximum-likelihood
estimates (<span class="caps">MLE</span>) of the population parameters using the <code>stats4::mle()</code>
function. First, we’ll find the <span class="caps">MLE</span> based only on the observed strength
of each coupon, taken as a single exact value.</p>
<div class="highlight"><pre><span></span><code><span class="nf">mle</span><span class="p">(</span>
<span class="nf">function</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sig</span><span class="p">)</span> <span class="p">{</span>
<span class="o">-</span><span class="nf">log_likelihood_normal</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sig</span><span class="p">,</span> <span class="n">dat</span><span class="o">$</span><span class="n">strength_observed</span><span class="p">,</span> <span class="s">"exact"</span><span class="p">)</span>
<span class="p">},</span>
<span class="n">start</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">1000</span><span class="p">,</span> <span class="m">100</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>##
## Call:
## mle(minuslogl = function(mu, sig) {
## -log_likelihood_normal(mu, sig, dat$strength_observed, "exact")
## }, start = c(1000, 100))
##
## Coefficients:
## mu sig
## 953.70230 73.27344
</code></pre></div>
<p>(Note that the value of start is just a starting point for the
numeric root finding.)</p>
<p>Here, we get the same value of mean that we previously calculated.</p>
<p>Now, we’ll repeat the <span class="caps">MLE</span> procedure, but now give it two pieces of
data for each coupon: one exact value, and one right-censored value.</p>
<div class="highlight"><pre><span></span><code><span class="nf">mle</span><span class="p">(</span>
<span class="nf">function</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sig</span><span class="p">)</span> <span class="p">{</span>
<span class="o">-</span><span class="nf">log_likelihood_normal</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span>
<span class="n">sig</span><span class="p">,</span>
<span class="nf">c</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength_observed</span><span class="p">,</span> <span class="n">dat</span><span class="o">$</span><span class="n">strength_observed</span><span class="p">),</span>
<span class="nf">c</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="s">"exact"</span><span class="p">,</span> <span class="m">20</span><span class="p">),</span> <span class="nf">rep</span><span class="p">(</span><span class="s">"right"</span><span class="p">,</span> <span class="m">20</span><span class="p">)))</span>
<span class="p">},</span>
<span class="n">start</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">1000</span><span class="p">,</span> <span class="m">100</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>##
## Call:
## mle(minuslogl = function(mu, sig) {
## -log_likelihood_normal(mu, sig, c(dat$strength_observed,
## dat$strength_observed), c(rep("exact", 20), rep("right",
## 20)))
## }, start = c(1000, 100))
##
## Coefficients:
## mu sig
## 1003.90717 88.51774
</code></pre></div>
<p>The mean estimated this way is remarkably close to the true value.</p>
<p>As we did in the previous blog post, we’ll next create a function
that returns the profile likelihood based on a value of $t_p$
(the value that the proportion $p$ of the population is below).</p>
<div class="highlight"><pre><span></span><code><span class="n">profile_likelihood_normal</span> <span class="o"><-</span> <span class="nf">function</span><span class="p">(</span><span class="n">tp</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">censored</span><span class="p">)</span> <span class="p">{</span>
<span class="n">m</span> <span class="o"><-</span> <span class="nf">mle</span><span class="p">(</span>
<span class="nf">function</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sig</span><span class="p">)</span> <span class="p">{</span>
<span class="o">-</span><span class="nf">log_likelihood_normal</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sig</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">censored</span><span class="p">)</span>
<span class="p">},</span>
<span class="n">start</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">1000</span><span class="p">,</span> <span class="m">100</span><span class="p">)</span> <span class="c1"># A starting guess</span>
<span class="p">)</span>
<span class="n">mu_hat</span> <span class="o"><-</span> <span class="n">m</span><span class="o">@</span><span class="n">coef</span><span class="p">[</span><span class="m">1</span><span class="p">]</span>
<span class="n">sig_hat</span> <span class="o"><-</span> <span class="n">m</span><span class="o">@</span><span class="n">coef</span><span class="p">[</span><span class="m">2</span><span class="p">]</span>
<span class="n">ll_hat</span> <span class="o"><-</span> <span class="nf">log_likelihood_normal</span><span class="p">(</span><span class="n">mu_hat</span><span class="p">,</span> <span class="n">sig_hat</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span> <span class="n">censored</span><span class="p">)</span>
<span class="nf">optimise</span><span class="p">(</span>
<span class="nf">function</span><span class="p">(</span><span class="n">sig</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">exp</span><span class="p">(</span>
<span class="nf">log_likelihood_normal</span><span class="p">(</span>
<span class="n">mu</span> <span class="o">=</span> <span class="n">tp</span> <span class="o">-</span> <span class="n">sig</span> <span class="o">*</span> <span class="nf">qnorm</span><span class="p">(</span><span class="n">p</span><span class="p">),</span>
<span class="n">sig</span> <span class="o">=</span> <span class="n">sig</span><span class="p">,</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">x</span><span class="p">,</span>
<span class="n">censored</span> <span class="o">=</span> <span class="n">censored</span>
<span class="p">)</span> <span class="o">-</span> <span class="n">ll_hat</span>
<span class="p">)</span>
<span class="p">},</span>
<span class="n">interval</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="n">sig_hat</span> <span class="o">*</span> <span class="m">5</span><span class="p">),</span>
<span class="n">maximum</span> <span class="o">=</span> <span class="kc">TRUE</span>
<span class="p">)</span><span class="o">$</span><span class="n">objective</span>
<span class="p">}</span>
</code></pre></div>
<p>The shape of this curve is as follows:</p>
<div class="highlight"><pre><span></span><code><span class="nf">data.frame</span><span class="p">(</span>
<span class="n">tp</span> <span class="o">=</span> <span class="nf">seq</span><span class="p">(</span><span class="m">700</span><span class="p">,</span> <span class="m">1000</span><span class="p">,</span> <span class="n">length.out</span> <span class="o">=</span> <span class="m">200</span><span class="p">)</span>
<span class="p">)</span> <span class="o">%>%</span>
<span class="nf">rowwise</span><span class="p">()</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">R</span> <span class="o">=</span> <span class="nf">profile_likelihood_normal</span><span class="p">(</span>
<span class="n">tp</span><span class="p">,</span>
<span class="m">0.1</span><span class="p">,</span>
<span class="nf">c</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength_observed</span><span class="p">,</span> <span class="n">dat</span><span class="o">$</span><span class="n">strength_observed</span><span class="p">),</span>
<span class="nf">c</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="s">"exact"</span><span class="p">,</span> <span class="m">20</span><span class="p">),</span> <span class="nf">rep</span><span class="p">(</span><span class="s">"right"</span><span class="p">,</span> <span class="m">20</span><span class="p">))</span>
<span class="p">))</span> <span class="o">%>%</span>
<span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">tp</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">R</span><span class="p">))</span> <span class="o">+</span>
<span class="nf">geom_line</span><span class="p">()</span> <span class="o">+</span>
<span class="nf">ggtitle</span><span class="p">(</span><span class="s">"Profile Likelihood for the 10th Percentile"</span><span class="p">)</span>
</code></pre></div>
<p><img alt="unnamed-chunk-13-1" src="https://www.kloppenborg.ca/2021/02/basis-values-censored-data/basis-values-censored-data_files/figure-markdown/unnamed-chunk-13-1.png"></p>
<p>Next, we’ll find the value of $u$ that satisfies this equation:</p>
<p>$$
0.05 = \frac{
\int_{-\infty}^{u}R(t_p) d t_p
}{
\int_{-\infty}^{\infty}R(t_p) d t_p
} $$</p>
<div class="highlight"><pre><span></span><code><span class="n">fn</span> <span class="o"><-</span> <span class="nf">Vectorize</span><span class="p">(</span><span class="nf">function</span><span class="p">(</span><span class="n">tp</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">profile_likelihood_normal</span><span class="p">(</span>
<span class="n">tp</span><span class="p">,</span>
<span class="m">0.1</span><span class="p">,</span>
<span class="nf">c</span><span class="p">(</span><span class="n">dat</span><span class="o">$</span><span class="n">strength_observed</span><span class="p">,</span> <span class="n">dat</span><span class="o">$</span><span class="n">strength_observed</span><span class="p">),</span>
<span class="nf">c</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="s">"exact"</span><span class="p">,</span> <span class="m">20</span><span class="p">),</span> <span class="nf">rep</span><span class="p">(</span><span class="s">"right"</span><span class="p">,</span> <span class="m">20</span><span class="p">)))</span>
<span class="p">})</span>
<span class="n">denominator</span> <span class="o"><-</span> <span class="nf">integrate</span><span class="p">(</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">fn</span><span class="p">,</span>
<span class="n">lower</span> <span class="o">=</span> <span class="m">0</span><span class="p">,</span>
<span class="n">upper</span> <span class="o">=</span> <span class="m">1000</span>
<span class="p">)</span>
<span class="nf">uniroot</span><span class="p">(</span>
<span class="nf">function</span><span class="p">(</span><span class="n">upper</span><span class="p">)</span> <span class="p">{</span>
<span class="n">trial_area</span> <span class="o"><-</span> <span class="nf">integrate</span><span class="p">(</span>
<span class="n">fn</span><span class="p">,</span>
<span class="n">lower</span> <span class="o">=</span> <span class="m">0</span><span class="p">,</span>
<span class="n">upper</span> <span class="o">=</span> <span class="n">upper</span>
<span class="p">)</span>
<span class="nf">return</span><span class="p">(</span><span class="n">trial_area</span><span class="o">$</span><span class="n">value</span> <span class="o">/</span> <span class="n">denominator</span><span class="o">$</span><span class="n">value</span> <span class="o">-</span> <span class="m">0.05</span><span class="p">)</span>
<span class="p">},</span>
<span class="n">interval</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">700</span><span class="p">,</span> <span class="m">1000</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## $root
## [1] 845.7739
##
## $f.root
## [1] -1.8327e-08
##
## $iter
## [1] 10
##
## $init.it
## [1] NA
##
## $estim.prec
## [1] 6.103516e-05
</code></pre></div>
<p>This value of $846$ is much higher than the value of $809$ that we found
earlier based on the coupon strength. But, this value of $846$ is a little
lower than the B-Basis of $852$ that was based on the actual strength of all
of the inserts installed.</p>
<p>One way to view the differences between these three numbers is as follows.
The B-Basis strength is related to the 10-th percentile of the strength.
But it is actually a confidence bound on the 10-th percentile. If we have only
a little bit of information about the strength, there is a lot of uncertainty
about the actual 10-th percentile, so the lower confidence bound is quite low.
If we have a lot of information about the strength, the uncertainty is small,
so the lower confidence bound is close to the actual 10-th percentile.</p>
<p>When we calculated a B-Basis from the observed coupon strength, we had 20 pieces
of information. When we calculated a B-Basis from the actual insert strength,
we had 40 pieces of information. When we calculated the B-Basis value
considering the censored data, we had 40 pieces of information, but half that
information wasn’t as informative as the other half: the exact values provide
more information than the censored values.</p>Basis Values Using a Likelihood Approach2021-02-09T00:00:00-05:002021-02-09T00:00:00-05:00Stefan Kloppenborgtag:www.kloppenborg.ca,2021-02-09:/2021/02/likelihood-basis-values/<p>All materials have some variability in their strength: some pieces of a
given material are stronger than others. The design standards for civil
aircraft mandate that one must account for this material variability.
This is done by setting appropriate material allowables such that either
$90\%$ or $99\%$ of the material …</p><p>All materials have some variability in their strength: some pieces of a
given material are stronger than others. The design standards for civil
aircraft mandate that one must account for this material variability.
This is done by setting appropriate material allowables such that either
$90\%$ or $99\%$ of the material will have a strength greater than the
allowable with $95\%$ confidence. These values are referred to as
B-Basis and A-Basis values, respectively. In the language of statistics,
they are lower tolerance bounds on the material strength.</p>
<p>When you’re designing an aircraft part, one of the first steps is to
determine the allowables to which you’ll compare the stress when
determining the margin of safety. For many metals, A- or B-Basis values
are published, and the designer will use those published values as the
allowable. However, when it comes to composite materials, it is often up
to the designer to determine the A- or B-Basis value themselves.</p>
<p>The most common way of calculating Basis values is to use the
statistical methods published in Volume 1 of
<a href="https://www.cmh17.org/"><span class="caps">CMH</span>-17</a> and implemented in the R package
<a href="https://www.cmstatr.net/"><code>cmstatr</code></a> (among other implementations).
These methods area based on <em>frequentest inference</em>.</p>
<p>For example, if the data is assumed to be normally distributed, with
this frequentest approach, you would calculate the B-Basis value using
the non-central <em>t-</em>distribution (see, for example, <a href='#Krishnamoorthy_Mathew_2008' id='ref-Krishnamoorthy_Mathew_2008-1'>
Krishnamoorthy and Mathew (2008)
</a>).</p>
<p>However, the frequentest approach is not the only way to calculate Basis
values: a <em>likelihood</em>-based approach can be used as well. The book
<em>Statistical Intervals</em> by <a href='#Meeker_Hahn_Escobar2017' id='ref-Meeker_Hahn_Escobar2017-1'>
Meeker et al. (2017)
</a> discusses this
approach, among other topics.</p>
<p>The basic idea of a likelihood-based inference is that you can observe
some data (by doing mechanical tests, or whatever), but you don’t yet
know the population parameters, such as the mean and the variance. But,
you can say that some possible values of the population parameters are
more likely than others. For example, if you perform 18 tension tests of
a material and the results are all around 100, the likelihood that the
population mean is 100 is pretty high, but the likelihood that the
population mean is 50 is really low. You can define a mathematical
function to quantify this likelihood: this is called the <em>likelihood
function</em>.</p>
<p>If you just need a point-estimate of the population parameters, you can
find the highest value of this likelihood function: this is called the
maximum likelihood estimate. If you need to find an interval or a bound
(for example, the B-Basis, which is a lower tolerance bound), you can
plot this likelihood function versus the population parameters and use
this distribution of likelihood to determine a range of population
parameters that are “sufficiently likely” to be within the interval.</p>
<p>The likelihood-based approach to calculating Basis values is more
computationally expensive, but it allows you to deal with data that is
left- or right-censored, and you can use the same computational
algorithm for a wide variety of location-scale distributions. I’m
planning on writing about calculating Basis values for censored data soon.</p>
<h1>Example Data</h1>
<p>For the purpose of this blog post, we’ll look at some data that is
included in the <code>cmstatr</code> package. We’ll use this data to calculate a
B-Basis value using the more traditional frequentest approach, then
using a likelihood-based approach.</p>
<p>We’ll start by loading several R packages that we’ll need:</p>
<div class="highlight"><pre><span></span><code><span class="nf">library</span><span class="p">(</span><span class="n">cmstatr</span><span class="p">)</span>
<span class="nf">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span>
<span class="nf">library</span><span class="p">(</span><span class="n">stats4</span><span class="p">)</span>
</code></pre></div>
<p>Next, we’ll get the data that we’re going to use. We’ll use the “warp
tension” data from the <code>carbon.fabric.2</code> data set that comes with
<code>cmstatr</code>. We’ll consider only the <code>RTD</code> environmental condition.</p>
<div class="highlight"><pre><span></span><code><span class="n">carbon.fabric.2</span> <span class="o">%>%</span>
<span class="nf">filter</span><span class="p">(</span><span class="n">test</span> <span class="o">==</span> <span class="s">"WT"</span> <span class="o">&</span> <span class="n">condition</span> <span class="o">==</span> <span class="s">"RTD"</span><span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## test condition batch panel thickness nplies strength modulus failure_mode
## 1 WT RTD A 1 0.113 14 129.224 8.733 LAB
## 2 WT RTD A 1 0.112 14 144.702 8.934 LAT,LWB
## 3 WT RTD A 1 0.113 14 137.194 8.896 LAB
## 4 WT RTD A 1 0.113 14 139.728 8.835 LAT,LWB
## 5 WT RTD A 2 0.113 14 127.286 9.220 LAB
## 6 WT RTD A 2 0.111 14 129.261 9.463 LAT
## 7 WT RTD A 2 0.112 14 130.031 9.348 LAB
## 8 WT RTD B 1 0.111 14 140.038 9.244 LAT,LGM
## 9 WT RTD B 1 0.111 14 132.880 9.267 LWT
## 10 WT RTD B 1 0.113 14 132.104 9.198 LAT
## 11 WT RTD B 2 0.114 14 137.618 9.179 LAT,LAB
## 12 WT RTD B 2 0.113 14 139.217 9.123 LAB
## 13 WT RTD B 2 0.113 14 134.912 9.116 LAT
## 14 WT RTD B 2 0.111 14 141.558 9.434 LAB / LAT
## 15 WT RTD C 1 0.108 14 150.242 9.451 LAB
## 16 WT RTD C 1 0.109 14 147.053 9.391 LGM
## 17 WT RTD C 1 0.111 14 145.001 9.318 LAT,LWB
## 18 WT RTD C 1 0.113 14 135.686 8.991 LAT / LAB
## 19 WT RTD C 1 0.112 14 136.075 9.221 LAB
## 20 WT RTD C 2 0.114 14 143.738 8.803 LAT,LGM
## 21 WT RTD C 2 0.113 14 143.715 8.893 LAT,LAB
## 22 WT RTD C 2 0.113 14 147.981 8.974 LGM,LWB
## 23 WT RTD C 2 0.112 14 148.418 9.118 LAT,LWB
## 24 WT RTD C 2 0.113 14 135.435 9.217 LAT/LAB
## 25 WT RTD C 2 0.113 14 146.285 8.920 LWT/LWB
## 26 WT RTD C 2 0.111 14 139.078 9.015 LAT
## 27 WT RTD C 2 0.112 14 146.825 9.036 LAT/LWT
## 28 WT RTD C 2 0.110 14 148.235 9.336 LWB/LAB
</code></pre></div>
<p>We really care only about the strength vector from this data, so we’ll
save that vectory by itself in a variable for easy access later.</p>
<div class="highlight"><pre><span></span><code><span class="n">dat</span> <span class="o"><-</span> <span class="p">(</span><span class="n">carbon.fabric.2</span> <span class="o">%>%</span>
<span class="nf">filter</span><span class="p">(</span><span class="n">test</span> <span class="o">==</span> <span class="s">"WT"</span> <span class="o">&</span> <span class="n">condition</span> <span class="o">==</span> <span class="s">"RTD"</span><span class="p">))[[</span><span class="s">"strength"</span><span class="p">]]</span>
<span class="n">dat</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## [1] 129.224 144.702 137.194 139.728 127.286 129.261 130.031 140.038 132.880
## [10] 132.104 137.618 139.217 134.912 141.558 150.242 147.053 145.001 135.686
## [19] 136.075 143.738 143.715 147.981 148.418 135.435 146.285 139.078 146.825
## [28] 148.235
</code></pre></div>
<h1>Frequentest B-Basis</h1>
<p>We can use the <code>cmstatr</code> package to calculate the B-Basis value from
this example data. We’re going to assume that the data follows a normal
distribution throughout this blog post.</p>
<div class="highlight"><pre><span></span><code><span class="nf">basis_normal</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">dat</span><span class="p">,</span> <span class="n">p</span> <span class="o">=</span> <span class="m">0.9</span><span class="p">,</span> <span class="n">conf</span> <span class="o">=</span> <span class="m">0.95</span><span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>##
## Call:
## basis_normal(x = dat, p = 0.9, conf = 0.95)
##
## Distribution: Normal ( n = 28 )
## B-Basis: ( p = 0.9 , conf = 0.95 )
## 127.5415
</code></pre></div>
<p>So using this approach, we get a B-Basis value of $127.54$.</p>
<h1>Likelihood-Based B-Basis</h1>
<p>The first step in implementing a likelihood-based approach is to define
a likelihood function. This function is the product of the probability
density function (<span class="caps">PDF</span>) at each observation ($X_i$), given a set of
population parameters ($\theta$) (see <a href='#Wasserman_2004' id='ref-Wasserman_2004-1'>
Wasserman (2004)
</a>).</p>
<p>$$
\mathcal{L}\left(\theta\right) = \prod_{i=1}^{n} f\left(X_i;\,\theta\right) $$</p>
<p>We’ll actually implement a log-likelihood function in R because taking a
log-transform avoids some numerical issues. This log-likelihood function
will take three arguments: the two parameters of the distribution (<code>mu</code>
and <code>sigma</code>) and a vector of the data.</p>
<div class="highlight"><pre><span></span><code><span class="n">log_likelihood_normal</span> <span class="o"><-</span> <span class="nf">function</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sig</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">suppressWarnings</span><span class="p">(</span>
<span class="nf">sum</span><span class="p">(</span>
<span class="nf">dnorm</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">mean</span> <span class="o">=</span> <span class="n">mu</span><span class="p">,</span> <span class="n">sd</span> <span class="o">=</span> <span class="n">sig</span><span class="p">,</span> <span class="n">log</span> <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>
<span class="p">)</span>
<span class="p">)</span>
<span class="p">}</span>
</code></pre></div>
<p>We can use this log-likelihood function to find the maximum-likelihood
estimates (<span class="caps">MLE</span>) of the population parameters using the <code>stats4::mle()</code>
function. This function takes the negative log-likelihood function and a
starting guess for the parameters.</p>
<div class="highlight"><pre><span></span><code><span class="nf">mle</span><span class="p">(</span>
<span class="nf">function</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sig</span><span class="p">)</span> <span class="p">{</span>
<span class="o">-</span><span class="nf">log_likelihood_normal</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sig</span><span class="p">,</span> <span class="n">dat</span><span class="p">)</span>
<span class="p">},</span>
<span class="n">start</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">130</span><span class="p">,</span> <span class="m">6.5</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>##
## Call:
## mle(minuslogl = function(mu, sig) {
## -log_likelihood_normal(mu, sig, dat)
## }, start = c(130, 6.5))
##
## Coefficients:
## mu sig
## 139.626036 6.594905
</code></pre></div>
<p>We will be denoting these maximum likelihood estimates as $\hat\mu$ and
$\hat\sigma$. They match the sample mean and sample standard deviation
within a reasonable tolerance, but are not exactly equal.</p>
<div class="highlight"><pre><span></span><code><span class="nf">mean</span><span class="p">(</span><span class="n">dat</span><span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## [1] 139.6257
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="nf">sd</span><span class="p">(</span><span class="n">dat</span><span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## [1] 6.716047
</code></pre></div>
<p>The relative likelihood is the ratio between the value of the likelihood
function evaluated at a given set of parameters to the value of the
likelihood function evaluated at the <span class="caps">MLE</span> of the parameters. The relative
likelihood would then be a function with two arguments: one for each of
the parameters $\mu$ and $\sigma$. To reduce the number of arguments,
<a href='#Meeker_Hahn_Escobar2017' id='ref-Meeker_Hahn_Escobar2017-2'>
Meeker et al. (2017)
</a> use a <em>profile likelihood</em> function instead.
This the same as the likelihood ratio, but it is maximized with respect
to $\sigma$, as defined below:</p>
<p>$$
R\left(\mu\right) = \max_\sigma \left[\frac{\mathcal{L}\left(\mu, \sigma\right)}{\mathcal{L}\left(\hat\mu, \hat\sigma\right)}\right] $$</p>
<p>When we’re trying to calculate a Basis value, we don’t really care about
the mean as a population parameter. Instead, we care about a particular
proportion of the population. Since a normal distribution (or other
location-scale distributions) are uniquely defined by two parameters,
<a href='#Meeker_Hahn_Escobar2017' id='ref-Meeker_Hahn_Escobar2017-3'>
Meeker et al. (2017)
</a> note that you can use two alternate
parameters instead. In our case, we’ll keep $\sigma$ as one of the
parameters, but we’ll use $t_p$ as the other instead. Here, $t_p$ is the
value that the proportion $p$ of the population falls below. For
example, $t_{0.1}$ would represent the 10-th percentile of the population.</p>
<p>We can convert between $\mu$ and $t_p$ as follows:</p>
<p>$$
\mu = t_p - \sigma \Phi^{-1}\left(p\right) $$</p>
<p>Given this re-parameterization, we can implement the profile likelihood
function as follows:</p>
<div class="highlight"><pre><span></span><code><span class="n">profile_likelihood_normal</span> <span class="o"><-</span> <span class="nf">function</span><span class="p">(</span><span class="n">tp</span><span class="p">,</span> <span class="n">p</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="n">m</span> <span class="o"><-</span> <span class="nf">mle</span><span class="p">(</span>
<span class="nf">function</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sig</span><span class="p">)</span> <span class="p">{</span>
<span class="o">-</span><span class="nf">log_likelihood_normal</span><span class="p">(</span><span class="n">mu</span><span class="p">,</span> <span class="n">sig</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
<span class="p">},</span>
<span class="n">start</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">130</span><span class="p">,</span> <span class="m">6.5</span><span class="p">)</span>
<span class="p">)</span>
<span class="n">mu_hat</span> <span class="o"><-</span> <span class="n">m</span><span class="o">@</span><span class="n">coef</span><span class="p">[</span><span class="m">1</span><span class="p">]</span>
<span class="n">sig_hat</span> <span class="o"><-</span> <span class="n">m</span><span class="o">@</span><span class="n">coef</span><span class="p">[</span><span class="m">2</span><span class="p">]</span>
<span class="n">ll_hat</span> <span class="o"><-</span> <span class="nf">log_likelihood_normal</span><span class="p">(</span><span class="n">mu_hat</span><span class="p">,</span> <span class="n">sig_hat</span><span class="p">,</span> <span class="n">x</span><span class="p">)</span>
<span class="nf">optimise</span><span class="p">(</span>
<span class="nf">function</span><span class="p">(</span><span class="n">sig</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">exp</span><span class="p">(</span>
<span class="nf">log_likelihood_normal</span><span class="p">(</span>
<span class="n">mu</span> <span class="o">=</span> <span class="n">tp</span> <span class="o">-</span> <span class="n">sig</span> <span class="o">*</span> <span class="nf">qnorm</span><span class="p">(</span><span class="n">p</span><span class="p">),</span>
<span class="n">sig</span> <span class="o">=</span> <span class="n">sig</span><span class="p">,</span>
<span class="n">x</span> <span class="o">=</span> <span class="n">x</span>
<span class="p">)</span> <span class="o">-</span> <span class="n">ll_hat</span>
<span class="p">)</span>
<span class="p">},</span>
<span class="n">interval</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="n">sig_hat</span> <span class="o">*</span> <span class="m">5</span><span class="p">),</span>
<span class="n">maximum</span> <span class="o">=</span> <span class="kc">TRUE</span>
<span class="p">)</span><span class="o">$</span><span class="n">objective</span>
<span class="p">}</span>
</code></pre></div>
<p>We can visualize the profile likelihood function:</p>
<div class="highlight"><pre><span></span><code><span class="nf">data.frame</span><span class="p">(</span>
<span class="n">tp</span> <span class="o">=</span> <span class="nf">seq</span><span class="p">(</span><span class="m">120</span><span class="p">,</span> <span class="m">140</span><span class="p">,</span> <span class="n">length.out</span> <span class="o">=</span> <span class="m">200</span><span class="p">)</span>
<span class="p">)</span> <span class="o">%>%</span>
<span class="nf">rowwise</span><span class="p">()</span> <span class="o">%>%</span>
<span class="nf">mutate</span><span class="p">(</span><span class="n">R</span> <span class="o">=</span> <span class="nf">profile_likelihood_normal</span><span class="p">(</span><span class="n">tp</span><span class="p">,</span> <span class="m">0.1</span><span class="p">,</span> <span class="n">dat</span><span class="p">))</span> <span class="o">%>%</span>
<span class="nf">ggplot</span><span class="p">(</span><span class="nf">aes</span><span class="p">(</span><span class="n">x</span> <span class="o">=</span> <span class="n">tp</span><span class="p">,</span> <span class="n">y</span> <span class="o">=</span> <span class="n">R</span><span class="p">))</span> <span class="o">+</span>
<span class="nf">geom_line</span><span class="p">()</span> <span class="o">+</span>
<span class="nf">ggtitle</span><span class="p">(</span><span class="s">"Profile Likelihood for the 10th Percentile"</span><span class="p">)</span>
</code></pre></div>
<p><img alt="unnamed-chunk-9-1" src="https://www.kloppenborg.ca/2021/02/likelihood-basis-values/likelihood-basis-values_files/figure-markdown/unnamed-chunk-9-1.png"></p>
<p>The way to interpret this plot is that it’s quite unlikely that the true
value of $t_p$ is 120, and it’s unlikely that it’s 140, but it’s pretty
likely that it’s around 131.</p>
<p>However, when we’re calculating Basis values, we aren’t trying to find
the most likely value of $t_p$: we’re trying to find a lower bound of
the value of $t_p$.</p>
<p>The asymptotic distribution of $R$ is the $\chi^2$ distribution. If
you’re working with large samples, you can use this fact to determine
the lower bound of $t_p$. However, for the sample sizes that are
typically used for composite material testing, the actual distribution
of $R$ is far enough from a $\chi^2$ distribution, that you can’t
actually do this.</p>
<p>Instead, we can use numerical integration to find the lower tolerance
bound. We can find a value of $t_p$, which we’ll call $u$, where
$0.05\%$ of the area under the $R$ curve is to its left. This will give
the $95\%$ lower confidence bound on the population parameter. This can
be written as follows. We’ll use numerical root finding to solve this
expression for $u$.</p>
<p>$$
0.05 = \frac{
\int_{-\infty}^{u}R(t_p) d t_p
}{
\int_{-\infty}^{\infty}R(t_p) d t_p
} $$</p>
<p>Since the value of $R$ vanishes as we move far from about 130, we won’t
actually integrate from $-\infty$ to $\infty$, but rather integrate
between two values are are relatively far from the peak of the $R$ curve.</p>
<p>We can implement this in the R language as follows. First, we’ll find
the value of the denominator.</p>
<div class="highlight"><pre><span></span><code><span class="n">fn</span> <span class="o"><-</span> <span class="nf">Vectorize</span><span class="p">(</span><span class="nf">function</span><span class="p">(</span><span class="n">tp</span><span class="p">)</span> <span class="p">{</span>
<span class="nf">profile_likelihood_normal</span><span class="p">(</span><span class="n">tp</span><span class="p">,</span> <span class="m">0.1</span><span class="p">,</span> <span class="n">dat</span><span class="p">)</span>
<span class="p">})</span>
<span class="n">denominator</span> <span class="o"><-</span> <span class="nf">integrate</span><span class="p">(</span>
<span class="n">f</span> <span class="o">=</span> <span class="n">fn</span><span class="p">,</span>
<span class="n">lower</span> <span class="o">=</span> <span class="m">100</span><span class="p">,</span>
<span class="n">upper</span> <span class="o">=</span> <span class="m">150</span>
<span class="p">)</span>
<span class="n">denominator</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## 4.339919 with absolute error < 8.9e-07
</code></pre></div>
<div class="highlight"><pre><span></span><code><span class="nf">uniroot</span><span class="p">(</span>
<span class="nf">function</span><span class="p">(</span><span class="n">upper</span><span class="p">)</span> <span class="p">{</span>
<span class="n">trial_area</span> <span class="o"><-</span> <span class="nf">integrate</span><span class="p">(</span>
<span class="n">fn</span><span class="p">,</span>
<span class="n">lower</span> <span class="o">=</span> <span class="m">0</span><span class="p">,</span>
<span class="n">upper</span> <span class="o">=</span> <span class="n">upper</span>
<span class="p">)</span>
<span class="nf">return</span><span class="p">(</span><span class="n">trial_area</span><span class="o">$</span><span class="n">value</span> <span class="o">/</span> <span class="n">denominator</span><span class="o">$</span><span class="n">value</span> <span class="o">-</span> <span class="m">0.05</span><span class="p">)</span>
<span class="p">},</span>
<span class="n">interval</span> <span class="o">=</span> <span class="nf">c</span><span class="p">(</span><span class="m">100</span><span class="p">,</span> <span class="m">150</span><span class="p">)</span>
<span class="p">)</span>
</code></pre></div>
<div class="highlight"><pre><span></span><code>## $root
## [1] 127.4914
##
## $f.root
## [1] -3.810654e-08
##
## $iter
## [1] 14
##
## $init.it
## [1] NA
##
## $estim.prec
## [1] 6.103516e-05
</code></pre></div>
<p>The B-Basis value that we get using this approach is $127.49$. This is
quite close to $127.54$, which was the value that we got using the
frequentest approach.</p>
<p>In a simple case like this data set, it wouldn’t be worth the extra
effort of using a likelihood-based approach to calculating the Basis
value, but we have demonstrated that this approach does work.</p>
<p>In a later blog post, we’ll explore a case where it is worth the extra
effort. (<em>Edit: that post is <a href="https://www.kloppenborg.ca/2021/02/basis-values-censored-data/">here</a></em>)</p>cmstatr: Composite Material Data Statistics in R2020-07-22T00:00:00-04:002020-07-22T00:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2020-07-22:/2020/07/cmstatr/<p>From what I’ve seen, a lot of the statistical analysis of data from composite
material data is done in <span class="caps">MS</span> Excel. There are a number of very good tools
for doing this analysis in <span class="caps">MS</span> Excel:
<a href="https://www.niar.wichita.edu/agate/Documents/default.htm"><span class="caps">ASAP</span></a>,
<a href="http://www.niar.wichita.edu/coe/NCAMP_Documents/Programs/HYTEQ%20Feb%207%20%202011.xls"><span class="caps">HYTEQ</span></a>,
<span class="caps">STAT</span>-17, and more recently,
<a href="https://www.cmh17.org/RESOURCES/StatisticsSoftware.aspx"><span class="caps">CMH17</span>-<span class="caps">STATS</span></a>.
I expect that the …</p><p>From what I’ve seen, a lot of the statistical analysis of data from composite
material data is done in <span class="caps">MS</span> Excel. There are a number of very good tools
for doing this analysis in <span class="caps">MS</span> Excel:
<a href="https://www.niar.wichita.edu/agate/Documents/default.htm"><span class="caps">ASAP</span></a>,
<a href="http://www.niar.wichita.edu/coe/NCAMP_Documents/Programs/HYTEQ%20Feb%207%20%202011.xls"><span class="caps">HYTEQ</span></a>,
<span class="caps">STAT</span>-17, and more recently,
<a href="https://www.cmh17.org/RESOURCES/StatisticsSoftware.aspx"><span class="caps">CMH17</span>-<span class="caps">STATS</span></a>.
I expect that the reason for the popularity of <span class="caps">MS</span> Excel for this
application is that everyone in the industry has <span class="caps">MS</span> Excel
installed on their computer and <span class="caps">MS</span> Excel is easy to use.</p>
<p>If you’ve read my blog before, you’ll know that I think that
<a href="https://www.kloppenborg.ca/2019/06/reproducibility/">reproducibility</a> is important for engineering
calculations. In my view, this includes statistical analysis. If the analysis
isn’t reproducible, how does a reviewer — either now or in the future — know
if it’s right?</p>
<p>The current <span class="caps">MS</span> Excel tools are typically password protected so that
users can’t view
the macros that perform the calculations. I suspect
that this was done with the best of intentions in order to prevent
users from changing the code. But it also means that users
can’t verify that the code is correct, or check if there are any unstated
assumptions made.</p>
<p>To allow statistical analysis of composite material data using open-source
software, I’ve written an package for the
<a href="https://www.r-project.org/">R programming language</a>
that implements the statistical
methods described in <a href="https://www.cmh17.org"><span class="caps">CMH</span>-17-1G</a>.
This package, <a href="https://www.cmstatr.net"><code>cmstatr</code></a> has been released on
<a href="https://cran.r-project.org/package=cmstatr"><span class="caps">CRAN</span></a>.
There is also a brief discussion of this package in a
<a href="https://doi.org/10.21105/joss.02265">paper published in the Journal of Open Source Software</a>.</p>
<p>This R package allows statistical analysis to be performed using
open-source tools — which can be verified by the user — and facilitates
statistical analysis reports to be written at the same time that the
analysis is performed by using <code>R-Notebooks</code> (see my
<a href="https://www.kloppenborg.ca/2019/10/pandoc-report-templates/">earlier post</a>).</p>
<p>I’ve tried to write the functions in a consistent manner so that it’s easier
to learn how to use the package. I’ve also written functions to work well
with the <a href="https://www.tidyverse.org/"><code>tidyverse</code></a> set of packages.</p>
<p>There are some examples of how to use the <code>cmstatr</code> package in
<a href="https://www.cmstatr.net/articles/cmstatr_Tutorial.html">this vignette</a>.</p>
<p>I hope that people find this package useful. If you use this package
and find a bug, have feedback or would like a feature added, please raise
an issue on
<a href="https://github.com/ComtekAdvancedStructures/cmstatr/issues">GitHub</a>.</p>Tracking Issues using Jupyter Notebooks2020-04-12T00:00:00-04:002020-04-12T00:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2020-04-12:/2020/04/tracking-issues/<p><em>Edit (26-May-2022): This post is largely obsolete, now that GitHub is able to
<a href="https://github.blog/2022-05-19-math-support-in-markdown/">render math in Markdown documents</a>,
including issues. I’m keeping this post up for historical reasons, but I’d
now recommend that you now use GitHub Issues directly and include mathematica
notation as needed.</em></p>
<p>I’m currently …</p><p><em>Edit (26-May-2022): This post is largely obsolete, now that GitHub is able to
<a href="https://github.blog/2022-05-19-math-support-in-markdown/">render math in Markdown documents</a>,
including issues. I’m keeping this post up for historical reasons, but I’d
now recommend that you now use GitHub Issues directly and include mathematica
notation as needed.</em></p>
<p>I’m currently collaborating on a paper. My collaborator and I are writing
the paper using LaTeX and we’re using git to track and share changes to
the manuscript. We currently have a shared repository on
<a href="https://www.github.com">GitHub</a>.</p>
<p>GitHub has a lot of great features for collaborating on software — after all
that’s why it was developed. The “Issues” features in a repository is a
particularly useful feature. This allows you to discuss problems, and
track the resolution of those problems. Text formatting is supported
in GitHub Issues using Markdown. In many flavors of markdown, you can
also embed math using LaTeX syntax. Unfortunately, GitHub flavored
markdown
<a href="https://github.com/github/markup/issues/274">does not support math</a>
(<em>Edit: Note that GitHub flavored markdown now <strong>does</strong> support math</em>).
This is probably fine for the vast majority of software projects. However,
it is a problem when we’re trying to discuss a mathematical model.</p>
<p>Several people on the internet have suggested various solutions to this
shortcoming. Some have suggested using an external engine to render your
math as an image, then embed that image in markdown. This works, but
I think it’s cumbersome.</p>
<p>Several others have suggested using a <a href="https://jupyter.org">Jupyter Notebook</a>,
which GitHub does
actually render. I think that this is a better solution, and this is the
solution that I’m planning on using with my collaborator.</p>
<h1>Implementation Summary</h1>
<p>In our git repository, I’m creating a folder called <code>issues-open</code>. Inside
this folder is a set of Jupyter Notebooks, one per issue. Each collaborator
can review these Notebooks, which conveniently get rendered on the
GitHub web interface.
When a collaborator has something to add to the issue, they can fire up
their Jupyter instance and make some changes — either by adding new
cells to the bottom of the notebook, or making changes to the existing text —
and committing and pushing the changes. We’ve adopted the practice of
starting each cell with a heading with the name of the author of that cell.
This way, the Notebook looks a bit like a conversation.</p>
<h1>Launching Jupyter Notebooks</h1>
<p>We’re using a <a href="https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html">conda environment</a>
for Python so that we’re synced up on the versions of each package we’re using.
So, the first step will be creating the conda environment from the environment
<span class="caps">YAML</span> file. In our case, this would look like this:</p>
<div class="highlight"><pre><span></span><code>conda env create -f environment.yml
</code></pre></div>
<p>This only needs to be done once on each computer. Once that’s been done, you
just need to activate the environment. This is basically just telling your
terminal that you want to use that version of Python. This can be accomplished
like the following (obviously, replace the name of the environment with the
correct name):</p>
<div class="highlight"><pre><span></span><code>conda activate my-environment
</code></pre></div>
<p>Now, you can launch the Jupyter Notebook session using the following.
Your web browser should pop up and allow you to create new notebooks
and edit existing notebooks in the browser once you run this command.</p>
<div class="highlight"><pre><span></span><code>jupyter notebook
</code></pre></div>
<h1>Collaborating on Issues</h1>
<p>The Jupyter Notebook interface is relatively straight forward and doesn’t need
much discussion here. Most of the important features are available through the
menus. There are keyboard shortcuts that come in handy, which can be
found <a href="https://towardsdatascience.com/jypyter-notebook-shortcuts-bf0101a98330">here</a>.</p>
<p>Jupyter notebooks comprise a set of cells. The basic types of cells are
markdown, code and raw. We’ll ignore raw cells here. Markdown cells contain
text styled using
<a href="https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html">markdown syntax</a>.
Code cells contain executable code. In our case, this will all be Python code.</p>
<p>If there is any code in the notebook, it’s important to realize that it
runs interactively. You execute one code cell at a time. You don’t have to
execute them in order either. So, if the code has side effects — like
changing a global variable — the order that you run the cells in makes a
difference. I think it’s good practice to restart your Python interpreter
and re-run all the cells before committing a notebook in git. To do this, just
click <code>Kernel</code> / <code>Restart & Run All</code>. This guarantees that the cells were
run in order and have repeatable output.</p>
<p>The other advantage to restarting the kernel and re-running all the cells
before committing is to avoid extraneous changes being tracked by git.
The notebook files include a counter indicating the order in which the cells
were executed. The first cell to be executed will have a counter value of 1,
the second will have a value of 2, etcetera. If you execute the first
five cells, then execute the first one again, it will now have a counter
value of 6.
If you’ve been playing around with a notebook for a while,
all those counters will be incremented even higher. Even if you make no real
changes to the notebook, git will register these counter changes as changes
that need to be committed and tracked. You really only want the real changes
to be tracked, and the easiest way to do this is to ensure that the code
cells are executed in order starting from an execution count of one.</p>
<h1>Closing an Issue</h1>
<p>When it’s time to close an issue, whomever closes the issue simply
moves the Jupyter Notebook discussing the issue to a folder called
<code>issues-closed</code>. This should be a <code>git-mv</code> so that the history is maintained.</p>
<p>As an example, to close the issue discussed in the Notebook
<code>reorder-model-development.ipynb</code>, the command would be:</p>
<div class="highlight"><pre><span></span><code>git mv issues-open/reorder-model-development.ipynb issues-closed/
</code></pre></div>Pandoc Report Templates2019-10-29T00:00:00-04:002019-10-29T00:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2019-10-29:/2019/10/pandoc-report-templates/<p>The main benefit of using Notebooks (R Notebooks or Jupyter Notebooks) is
that the document is reproducible: the reader knows exactly how the results
of the analysis were obtained. I wrote about the use of Notebooks in an
<a href="https://www.kloppenborg.ca/2019/06/reproducibility/">earlier post.</a></p>
<p>Most organizations have a certain report format: a certain cover …</p><p>The main benefit of using Notebooks (R Notebooks or Jupyter Notebooks) is
that the document is reproducible: the reader knows exactly how the results
of the analysis were obtained. I wrote about the use of Notebooks in an
<a href="https://www.kloppenborg.ca/2019/06/reproducibility/">earlier post.</a></p>
<p>Most organizations have a certain report format: a certain cover sheet layout,
a certain font, a log of revisions, etcetera. For the most part, organizations
have an <span class="caps">MS</span> Word template for this report format. If you want to use a Notebook
for you analysis and to write your report, you have a few options:</p>
<ul>
<li>You could write front matter in <span class="caps">MS</span> Word using your company’s report template
and then attach the Notebook as an appendix.</li>
<li>You could also use <code>Pandoc</code> (more about what this is later) to convert the
Notebook into a .docx file and then merge it into the report template.</li>
<li>You could create your own <code>Pandoc</code> template to convert a Notebook directly into
a <span class="caps">PDF</span> with the correct formatting.</li>
</ul>
<p>The first option of attaching a Notebook as an appendix to a report otherwise
created in <span class="caps">MS</span> Word is effective but is means that you need to maintain two
different files: the <span class="caps">MS</span> Word report and the Notebook itself. The second option
of exporting the Notebook to <span class="caps">MS</span> Word and merging it into the template is
problematic when it comes to document revisions. If the part of the analysis
is revised, there is a temptation to change the affected part by either only
re-exporting that section from the Notebook into docx, or worse, making the
change directly in <span class="caps">MS</span> Word. In both cases, there is the possibility of
breaking the reproducibility. For example, let’s say that in your report you
define some constants at the beginning and do some math using these constants:</p>
<div class="highlight"><pre><span></span><code><span class="n">P</span> <span class="o">=</span> <span class="mi">1000</span>
<span class="n">A1</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">A2</span> <span class="o">=</span> <span class="mi">4</span>
<span class="n">sigma1</span> <span class="o">=</span> <span class="n">P</span> <span class="o">/</span> <span class="n">A1</span>
<span class="nb">print</span><span class="p">(</span><span class="n">sigma1</span><span class="p">)</span>
<span class="c1"># 500</span>
<span class="n">sigma2</span> <span class="o">=</span> <span class="n">P</span> <span class="o">/</span> <span class="n">A2</span>
<span class="nb">print</span><span class="p">(</span><span class="n">sigma2</span><span class="p">)</span>
<span class="c1"># 250</span>
</code></pre></div>
<p>Now let’s say that you ask your new intern to revise the document so that
$P = 1200$. They just edit the <span class="caps">MS</span> Word version of the report thinking that
they will save some time. They don’t notice that $P$ is used twice in the
calculation and only update the result from the first time it’s used. Now
the report reads:</p>
<div class="highlight"><pre><span></span><code><span class="n">P</span> <span class="o">=</span> <span class="mi">1200</span>
<span class="n">A1</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">A2</span> <span class="o">=</span> <span class="mi">4</span>
<span class="n">sigma1</span> <span class="o">=</span> <span class="n">P</span> <span class="o">/</span> <span class="n">A1</span>
<span class="nb">print</span><span class="p">(</span><span class="n">sigma1</span><span class="p">)</span>
<span class="c1"># 600</span>
<span class="n">sigma2</span> <span class="o">=</span> <span class="n">P</span> <span class="o">/</span> <span class="n">A2</span>
<span class="nb">print</span><span class="p">(</span><span class="n">sigma2</span><span class="p">)</span>
<span class="c1"># 250</span>
</code></pre></div>
<p>The report is now wrong. In a simple case like this, you’ll probably notice
the error when you review your intern’s work, but if the math was significantly
more complex, there is probably a fairly good chance that you wouldn’t pick up
on the newly introduced error.</p>
<p>For this reason, I think that the best option is to create a <code>Pandoc</code> template
for your company’s report template. This means that you’ll be creating a <span class="caps">PDF</span>
directly from the Notebook. In order to revise the report, you have to re-run
the Notebook — the whole Notebook.</p>
<p>For those unfamiliar with <a href="https://pandoc.org/"><code>Pandoc</code></a>, it is a program for
converting between
various file formats. It’s also free and open-source software. Commonly, it’s
used for converting from Markdown into <span class="caps">HTML</span> or <span class="caps">PDF</span> (actually, <code>Pandoc</code> converts
to a <a href="https://www.latex-project.org/">LaTeX</a> format and LaTeX converts to <span class="caps">PDF</span>,
but this happens transparently).
<code>Pandoc</code> can also convert into <span class="caps">MS</span> Word (.docx) and several other formats.</p>
<p>When I decided to create a corporate format for use with notebooks, I
looked at the types of notebooks that we use. Generally, statistics are
done in an <a href="https://bookdown.org/yihui/rmarkdown/notebook.html">R-Notebook</a>
and other analysis is done in a <a href="https://jupyter.org/">Jupyter notebook</a>.
Unfortunately, R-Notebooks and Jupyter Notebooks use different templates.
R-Notebooks use <code>pandoc</code> templates, while Jupyter uses its own template.
Fortunately, there is a workaround. Jupyter is able to export to markdown,
which can be read by <code>pandoc</code> and translated to <span class="caps">PDF</span> using a pandoc template.
Thus, I made the decision to write a <code>pandoc</code> template.</p>
<p>When <code>pandoc</code> converts a markdown file to <span class="caps">PDF</span>, it actually uses LaTeX.
The <code>pandoc</code> template is actually a template for converting markdown
into LaTeX. <code>Pandoc</code> then calls <code>pdflatex</code> to turn this <code>.tex</code> file into
a <span class="caps">PDF</span>. </p>
<p>When I first started figuring out how to write a template for converting
markdown to <span class="caps">PDF</span>, I thought I was going to have to write a LaTeX class or style.
I got scared. LaTeX classes are not for the faint of heart. But, I soon
realized that I didn’t actually have to do that. The <code>pandoc</code> template
that I needed to write was just a regular LaTeX document that has some
parameters that <code>pandoc</code> can fill in. I’m not sure that I could figure out
how to write a LaTeX class in a reasonable amount of time, but I sure can
write a document using LaTeX.
This is something that I learned to do when I wrote my undergraduate
thesis, and while I don’t write LaTeX often anymore, it’s really not
that hard.</p>
<p>A very basic LaTeX file would look something like this:</p>
<div class="highlight"><pre><span></span><code><span class="k">\documentclass</span><span class="nb">{</span>article<span class="nb">}</span>
<span class="k">\begin</span><span class="nb">{</span>document<span class="nb">}</span>
<span class="k">\title</span><span class="nb">{</span>My Report Title<span class="nb">}</span>
<span class="k">\author</span><span class="nb">{</span>A. Student<span class="nb">}</span>
<span class="k">\maketitle</span>
<span class="k">\section</span><span class="nb">{</span>Introduction<span class="nb">}</span>
Some text
<span class="k">\end</span><span class="nb">{</span>document<span class="nb">}</span>
</code></pre></div>
<p>A <code>pandoc</code> template is just a LaTeX file, but with placeholder for the content
that <code>pandoc</code> will insert. These placeholders are just variables surrounded
with dollar signs. For example, <code>pandoc</code> has a variable called <code>body</code>. This
variable will contain the body of the report. We would simply put <code>$body$</code>
in the part of the template where we want <code>pandoc</code> to insert the body of the report.</p>
<p><code>Pandoc</code> also supports <code>for</code> and <code>if</code> statements. A common pattern is to check
for the existence of a variable and use it if it does exist and use a default
value if it does not. The syntax for this would look something like:</p>
<div class="highlight"><pre><span></span><code><span class="o">$</span><span class="k">if</span><span class="p">(</span><span class="n">myvar</span><span class="p">)</span><span class="o">$</span><span class="w"></span>
<span class="w"> </span><span class="o">$</span><span class="n">myvar</span><span class="o">$</span><span class="w"></span>
<span class="o">$</span><span class="k">else</span><span class="o">$</span><span class="w"></span>
<span class="w"> </span><span class="n">Default</span><span class="w"> </span><span class="n">text</span><span class="w"></span>
<span class="o">$</span><span class="n">endif</span><span class="o">$</span><span class="w"></span>
</code></pre></div>
<p>I’ve written the above code on multiple lines for readability, but it could be
written on a single line too.</p>
<p>Similarly, if a variable is a list, you’d use a <code>for</code> statement to iterate over
the list. We’ll cover this later when we talk about adding logs of revisions.</p>
<h1>Defining New Template Variables</h1>
<p><code>Pandoc</code> defines a number of variables by default. However, you’ll likely need
to define some variables of your own. First of all, you’ll likely need to
define a variable for the report number and the revision.</p>
<p>To create the variable, it’s just a matter of defining it in the
<a href="http://yaml.org/"><code>YAML</code></a> header of the markdown file. Variables can either
have a single value or they can be lists. Elements of a list start with
dash at the beginning of the line.</p>
<p>Once we add the report number (which we’ll call <code>report-no</code>) and the revision
(which we’ll call <code>rev</code>) to the <code>YAML</code> header, the <span class="caps">YAML</span> header will look like
the following:</p>
<div class="highlight"><pre><span></span><code><span class="nt">title</span><span class="p">:</span><span class="w"> </span><span class="s">"Report</span><span class="nv"> </span><span class="s">Title"</span><span class="w"></span>
<span class="nt">author</span><span class="p">:</span><span class="w"> </span><span class="s">"A.</span><span class="nv"> </span><span class="s">Student"</span><span class="w"></span>
<span class="nt">report-no</span><span class="p">:</span><span class="w"> </span><span class="s">"RPT-001"</span><span class="w"></span>
<span class="nt">rev</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">B</span><span class="w"></span>
</code></pre></div>
<p>(Bonus points if you immediately though of William Sealy Gosset when you read that).</p>
<p>We’ll probably want to add a log of revisions to the report. The contents of
this log of revisions will have to come from somewhere, and the <code>YAML</code> header
is the most logical place. The log of revisions will be a list with one
element of the list corresponding to each revision in the log. Lists can
have nested members. In our case, an entry within the log of revisions
will have a revision letter, a date and a description. Including the
log of revisions, the <code>YAML</code> header will look like this:</p>
<div class="highlight"><pre><span></span><code><span class="nt">title</span><span class="p">:</span><span class="w"> </span><span class="s">"Report</span><span class="nv"> </span><span class="s">Title"</span><span class="w"></span>
<span class="nt">author</span><span class="p">:</span><span class="w"> </span><span class="s">"A.</span><span class="nv"> </span><span class="s">Student"</span><span class="w"></span>
<span class="nt">report-no</span><span class="p">:</span><span class="w"> </span><span class="s">"RPT-001"</span><span class="w"></span>
<span class="nt">rev</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">B</span><span class="w"></span>
<span class="nt">rev-log</span><span class="p">:</span><span class="w"></span>
<span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">rev</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">A</span><span class="w"></span>
<span class="w"> </span><span class="nt">date</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1-Jun-2019</span><span class="w"></span>
<span class="w"> </span><span class="nt">desc</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Initial release</span><span class="w"></span>
<span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">rev</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">B</span><span class="w"></span>
<span class="w"> </span><span class="nt">date</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">18-Jun-2019</span><span class="w"></span>
<span class="w"> </span><span class="nt">desc</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Updated loads based on fligt test data</span><span class="w"></span>
</code></pre></div>
<p>We can now use these variables in our <code>pandoc</code> template. Using the variables
<code>report-no</code> and <code>rev</code> are straight forward and will be just the same as
using the default variables (like <code>title</code> and <code>author</code>).</p>
<p>Using the list variables will require the use of a <code>for</code> statement. In the
case of a log of revisions, each revision will get a row in a LaTeX table.
Using the variable <code>rev-log</code>, this table will look like this:</p>
<div class="highlight"><pre><span></span><code><span class="k">\begin</span><span class="nb">{</span>tabular<span class="nb">}{</span>| m<span class="nb">{</span>0.25in<span class="nb">}</span> | m<span class="nb">{</span>0.95in<span class="nb">}</span> | m<span class="nb">{</span>4.0in<span class="nb">}</span> |<span class="nb">}</span>
<span class="k">\hline</span>
Rev Ltr <span class="nb">&</span> Date <span class="nb">&</span> Description <span class="k">\\</span>
<span class="s">$</span><span class="nb">for</span><span class="o">(</span><span class="nb">rev</span><span class="o">-</span><span class="nb">log</span><span class="o">)</span><span class="s">$</span>
<span class="k">\hline</span>
<span class="s">$</span><span class="nb">rev</span><span class="o">-</span><span class="nb">log.rev</span><span class="s">$</span> <span class="nb">&</span> <span class="s">$</span><span class="nb">rev</span><span class="o">-</span><span class="nb">log.date</span><span class="s">$</span> <span class="nb">&</span> <span class="s">$</span><span class="nb">rev</span><span class="o">-</span><span class="nb">log.desc</span><span class="s">$</span> <span class="k">\\</span>
<span class="s">$</span><span class="nb">endfor</span><span class="s">$</span>
<span class="k">\hline</span>
<span class="k">\end</span><span class="nb">{</span>tabular<span class="nb">}</span>
</code></pre></div>
<p>In the above LaTeX code, everything between <code>$for(...)$</code> and <code>$endfor$</code> gets
repeated for each item in the list <code>rev-log</code>. We can access the nested members
using dot notation.</p>
<h1>Using the Pandoc Template from an R-Notebook</h1>
<p>RStudio handles a lot of the interface with <code>pandoc</code>. Adding the following to
the <code>YAML</code> header of the R-Notebook should cause RStudio to use your new
template when it compiles the R-Notebook to <span class="caps">PDF</span>. This <em>should</em> be all
you need to do.</p>
<div class="highlight"><pre><span></span><code><span class="nt">output</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">pdf_document</span><span class="p">:</span><span class="w"></span>
<span class="w"> </span><span class="nt">template</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">my_template_file.tex</span><span class="w"></span>
<span class="w"> </span><span class="nt">toc_depth</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">3</span><span class="w"></span>
<span class="w"> </span><span class="nt">fig_caption</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">true</span><span class="w"></span>
<span class="w"> </span><span class="nt">keep_tex</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">false</span><span class="w"></span>
<span class="w"> </span><span class="nt">df_print</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">kable</span><span class="w"></span>
</code></pre></div>
<h1>Using the Pandoc Template from a Jupyter Notebook</h1>
<p>Using your new <code>pandoc</code> template from a Jupyter Notebook is a bit more
complicated because Jupyter doesn’t work directly with <code>pandoc</code>. First of all,
we need to tell <code>nbconvert</code> to convert to markdown. I think that it’s best to
re-run the notebook at the same time (to make sure that it is, in fact,
fully reproducible. You can do this using <code>nbconvert</code> as follows:</p>
<div class="highlight"><pre><span></span><code>jupyter nbconvert --execute --to markdown my-notebook.ipynb
</code></pre></div>
<p>But, Jupyter notebooks don’t have <code>YAML</code> headers like R-Notebooks do, so
we need a place to put all the variables that the template needs. The easiest
way to do this is to create a cell at the beginning of the notebook with the
cell type set as <code>raw</code>, then enter the <code>YAML</code> header into this cell, including
the starting end ending fences (<code>---</code>). This cell would, then, have a content
similar to the following. Cells of type <code>raw</code> simply get copied to the output,
so this becomes the <code>YAML</code> header in the resulting markdown file.</p>
<div class="highlight"><pre><span></span><code><span class="nn">---</span><span class="w"></span>
<span class="nt">title</span><span class="p">:</span><span class="w"> </span><span class="s">"Report</span><span class="nv"> </span><span class="s">Title"</span><span class="w"></span>
<span class="nt">author</span><span class="p">:</span><span class="w"> </span><span class="s">"A.</span><span class="nv"> </span><span class="s">Student"</span><span class="w"></span>
<span class="nt">report-no</span><span class="p">:</span><span class="w"> </span><span class="s">"RPT-001"</span><span class="w"></span>
<span class="nt">rev</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">B</span><span class="w"></span>
<span class="nt">rev-log</span><span class="p">:</span><span class="w"></span>
<span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">rev</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">A</span><span class="w"></span>
<span class="w"> </span><span class="nt">date</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1-Jun-2019</span><span class="w"></span>
<span class="w"> </span><span class="nt">desc</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Initial release</span><span class="w"></span>
<span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">rev</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">B</span><span class="w"></span>
<span class="w"> </span><span class="nt">date</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">18-Jun-2019</span><span class="w"></span>
<span class="w"> </span><span class="nt">desc</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Updated loads based on flight test data</span><span class="w"></span>
<span class="nn">---</span><span class="w"></span>
</code></pre></div>
<p>Once you’ve used <code>nbconvert</code> to create the markdown file, you can call
<code>pandoc</code>. You’ll have to provide the template as a command-line argument
and also specify the output filename (so that <code>pandoc</code> knows you want a
pdf) and also give the code highlighting style. The call to <code>pandoc</code> will
look something like this.</p>
<div class="highlight"><pre><span></span><code><span class="sb">`</span>pandoc<span class="sb">`</span> my-notebook.md -N --template<span class="o">=</span>my_template_file.tex -o my-notebook.pdf --highlight-style<span class="o">=</span>tango
</code></pre></div>
<h1>Documentation of Your Template</h1>
<p>A “trick” that I’ve used is to add some documentation about how to use the
template inside the template itself. It’s pretty unlikely that the user
will actually open up the template, but it’s relatively likely that the user
will forget one of the variables that the template expects. Since <code>pandoc</code>
allows <code>if/else</code> statements, I’ve added the following to my template:</p>
<div class="highlight"><pre><span></span><code><span class="s">$</span><span class="nb">if</span><span class="o">(</span><span class="nb">abstract</span><span class="o">)</span><span class="s">$</span>
<span class="k">\abstract</span><span class="nb">{</span><span class="s">$</span><span class="nb">abstract</span><span class="s">$</span><span class="nb">}</span>
<span class="s">$</span><span class="nb">else</span><span class="s">$</span>
<span class="k">\abstract</span><span class="nb">{</span>
The documentation for using the template goes here
<span class="nb">}</span>
<span class="s">$</span><span class="nb">endif</span><span class="s">$</span>
</code></pre></div>
<p>This means that if the user forgets to define the <code>abstract</code> variable,
the cover page of the report (where the abstract normally goes in my
case) will contain the documentation for the template.</p>
<h1>Change Bars: Future Work</h1>
<p>One of the things that I haven’t yet figured out are change bars. In my
organization, we put vertical bars in the margin of reports to indicate
what part of a report has been revised. There are LaTeX packages for
(manually) inserting
<a href="https://www.ctan.org/pkg/changebar">change bars into documents</a>. However,
I haven’t yet figured out how to automatically insert these into a report
generated using <code>pandoc</code>. I’m sure there’s a way, though.</p>
<h1>Conclusion</h1>
<p>I hope that this demystifies the process of writing a <code>pandoc</code> template
to allow you to create reports directly from Jupyter Notebooks or R-Notebooks
in your company’s report format.</p>
<p><em>(Edited to fix a few typos)</em></p>Package Adequacy for Engineering Calculations2019-06-29T00:00:00-04:002019-06-29T00:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2019-06-29:/2019/06/adequacy/<p>If you do engineering calculations or analysis using a language like
<a href="https://www.r-project.org/">R</a> or <a href="https://www.python.org/">Python</a>,
chances are that you’re going to use some packages.
Packages are collections of code that someone else has written that you
can use in your code.
For example, if you need to solve a system …</p><p>If you do engineering calculations or analysis using a language like
<a href="https://www.r-project.org/">R</a> or <a href="https://www.python.org/">Python</a>,
chances are that you’re going to use some packages.
Packages are collections of code that someone else has written that you
can use in your code.
For example, if you need to solve a system of linear equations
by inverting a matrix and you’re using Python, you might use
<a href="http://www.numpy.org/">numpy</a>.
Or if you’re using R and you need to fit a linear model to some data,
you would probably use the
<a href="https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html">stats</a> package.</p>
<p>If you’re involved in “engineering,” you need a high level of confidence
that the results that you’re getting are correct.
Note that in this post — and my blog in general — that when I say
“engineering,” I don’t mean <em>software engineering:</em> I mean design and analysis
of of structures or systems that have an effect on safety.
I work in civil aeronautics, mainly dealing with composites, but also
dealing with metallic structure regularly.
Depending on the particular type of engineering that you’re engaged in
and the particular problem at hand, the consequences of getting the
wrong answer could be fairly severe.
You better be sure that both the interpreter and the packages are correct.
Probably the best way to do this is to validate the results using another
method: are there published results for a similar problem that you can use as a benchmark?
Perhaps you can do some physical testing?
But even if you’re doing you due diligence and validating the results somehow,
you will still waste a lot of your time if there were a problem with either
the interpreter or one of the packages.</p>
<p>Compiled languages — like <code>C</code> or <code>FORTRAN</code> —are compiled into machine code
that runs directly on the processor.
Interpreted languages, like Python, R or JavaScript, are not compiled into
machine code, but instead an interpreter (a piece of software) reads each
line of code and figures out how to run it when you run the code
(not ahead of time).
As far as interpreters go, if you’re using <code>CPython</code> (the “standard”
Python interpreter) or <code>GNU-R</code> (the “standard” R interpreter),
I think there is a rather low risk that there are any errors in the interpreter.
These interpreters are written by a bunch of smart people, and both are open
source, so the code that makes up the interpreters themselves are read by a
much larger group of smart people.
Furthermore, both interpreters are widely used and have been around for a
while, so it’s very likely that significant bugs that are likely to change
the result of an engineering calculation would have been found by users by
now and would have been fixed.</p>
<p>Packages are more of a risk than interpreters are.
Again, if you’re using a very widely used package that has been around for a
while, like <code>numpy</code> (in Python) or <code>stats</code> (in R), there’s a pretty good
chance that any bugs that would affect your calculations would have been
found by now — and packages like these are maintained by groups of dedicated people.</p>
<p>If you’re using R, chances are that you’re getting your packages from
<a href="https://cran.r-project.org/"><span class="caps">CRAN</span></a>.
You should be reading the <span class="caps">CRAN</span> page for the package that you’re using.
You can find an example of such a page
<a href="https://cran.r-project.org/package=MASS">here</a>.
There are a few things that you should look for to help you evaluate the
reliability of the package (in addition to reference manual and any vignettes
that explain how to use the package).
The first is the priority of the package.
Not all packages have a priority, but if the priority is “base” or
“recommended,” the package is maintained by the r-core team and is almost
certainly used by a lot of people.
You can be fairly comfortable with these packages.</p>
<p>The second thing that you should look at on the <span class="caps">CRAN</span> page for a package is
the <span class="caps">CRAN</span> Checks.
<span class="caps">CRAN</span> will test all the packages every time a new version of R is released and
it tests all the packages routinely to determine if a change in one package
caused errors in another packages.
You can see an example <span class="caps">CRAN</span> Check for my package <code>rde</code>
<a href="https://cran.r-project.org/web/checks/check_results_rde.html">here</a>.</p>
<p>This practice is called
<a href="https://en.m.wikipedia.org/wiki/Continuous_integration">continuous integration</a>.
It does all of these checks on several different operating systems — Windows,
<span class="caps">OSX</span>, and several Linux distributions.
If you open the <span class="caps">CRAN</span> Checks results for a package, you’ll see a table of all
the various combinations of R version and operating system that have been
tested along with the amount of time that it took to run the test and a status
for each.
If the Status is “<span class="caps">OK</span>,” then there were no errors identified.
If the Status is “<span class="caps">NOTE</span>,” “<span class="caps">WARNING</span>,” or “<span class="caps">ERROR</span>.”
There might be something wrong and it may or may not be serious.
If you click on the Status link, you’ll see details and can evaluate for yourself.</p>
<p>I think that these <span class="caps">CRAN</span> checks are actually a very strong point for the R
ecosystem.
It ensures that package maintainers know when something outside of their
package breaks their code.
And, it enforces a certain level of quality: package maintainers are given a
certain amount of time to fix errors, and if they don’t the package gets
removed from <span class="caps">CRAN</span>.</p>
<p>The <span class="caps">CRAN</span> checks do a few things.
First, they check that the package can, in fact, be loaded (maybe there’s an
error that prevents you from using it at all).
There are a few other things that it does, but the most important in terms of
reliability of the package is that the <span class="caps">CRAN</span> checks will run any test created
by the package maintainer.
These tests are called
<a href="https://en.m.wikipedia.org/wiki/Unit_testing">unit tests</a>.
They are test that determine if the code in the package actually has the
expected behavior.
Package maintainers don’t have to write unit tests, but the good ones do.
You can look at what tests the package maintainer has written by downloading
the code of the package (you can download it from <span class="caps">CRAN</span>).
The test are in a folder called <code>tests</code>.
Tests basically work by providing some input to the package’s functions,
and checking that the result is correct.
For R packages, the <code>testthat</code> framework is a popular testing framework.
For packages that use the <code>testthat</code> framework, you’ll see a number of
statements that use the <code>expect_...</code> family of functions.
Some of these tests will likely ensure that the package works at
all — checking things like the return type for functions, or that a
function actually does raise certain errors when invalid arguments are
passed to it.
Some of the tests should also ensure that the package provides correct results.
When I write tests for a package, I always write both types of tests.
For the tests that ensure that the results are correct, I often either check
cases that have closed-form solutions, or check that the code in the package
produces results that are approximately equal to example results published in
articles or books.
You’ll need to read through the tests to decide if they provide enough
assurance that the package is correct.</p>
<p>If you decide that the tests for a package are not sufficient, you have
three options.</p>
<ul>
<li>You could choose not the use that package: maybe there is another that
does something similar.</li>
<li>You can write tests yourself and contribute those tests back to the
package maintainer.
After all, R packages are open-source and users are encouraged to
contribute back to the community.
Most package maintainers would be happy to receive a patch that adds more
tests: writing tests is not fun, and most people would be grateful if
someone else offers to do it.</li>
<li>You could also manually test the package. The difficulty here is ensuring
that you re-test the package every time you update the version of this
package on your system.</li>
</ul>
<p>In the python world, continuous integration isn’t as well integrated into the
ecosystem.
Most packages that you install probably come from PyPI.
As far as I know, PyPI doesn’t do any continuous integration: it’s up to the
package maintainer to run their tests regularly.
Package maintainers can do one of two things: they can run the tests on their
own machine before releasing a new version to PyPI, or they can use a
continuous integration service like Travis-<span class="caps">CI</span> or CircleCI.
Many of the continuous integration services provide the service for free for
open source projects, so many Python packages do use a continuous integration
services.
Packages that use a continuous integration service normally advertise it in
their <span class="caps">README</span> file.
You’ll still need to assess whether the tests are adequate, and if the
package doesn’t use continuous integration, you’ll have to either run the
test yourself, or trust that the package maintainer did.</p>
<p>If you have already written tests for your package, setting up continuous
integration using Travis-<span class="caps">CI</span> is quite straight forward. I haven’t personally
used CircleCI, but I would imagine that it’s similarly easy to use.
You can see the continuous integration results from my pcakge <code>rde</code> on
Travis-<span class="caps">CI</span> <a href="https://travis-ci.com/kloppen/rde">here</a>.</p>
<p>Whether you’re using Python or R, there are ways of ensuring that the packages
you use for engineering calculations are adequate for your needs.
Some people seem to be a little bit scared of open source packages and
software for engineering calculations, but in a lot of ways, open source
software is actually better for this since you have the ability of verifying
it yourself and making a decision about whether to use it.</p>Automating Software Validation Reports2019-06-20T00:00:00-04:002019-06-20T00:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2019-06-20:/2019/06/automating-software-validation-reports/<p>I’ve been working on a Python package to analyze adhesively bonded joints
recently. This package will be used to analyze adhesive joints in certain
aircraft structure and will be used to substantiate the design of structural
repairs, amongst other uses. Because of this, the output of this package needs …</p><p>I’ve been working on a Python package to analyze adhesively bonded joints
recently. This package will be used to analyze adhesive joints in certain
aircraft structure and will be used to substantiate the design of structural
repairs, amongst other uses. Because of this, the output of this package needs
to be validated against test data. This validation also needs to be documented
in an engineering report.</p>
<p>I’ve been thinking about how to do this. On one hand, I’ve been thinking about
the types of (mechanical) tests that we’ll need to run to validate the model
and the various test configurations that we’ll need to include in the
validation test matrix. On the other hand, I’ve also been thinking about
change change management of the package and ensuring that validation report
stays up to date.</p>
<p>I’m imagining the scenario where we run the validation
testing and find that the model and the test results agree within, say, 10%.
Maybe that’s good enough for the purpose (depending on the direction of the
disagreement). We can then write our validation report
and type out the sentence <em>“the test data and the model were found to agree
within 10%.”</em> Then, I’m imagining that we make a refinement to the model
formulation and release a new version of the package that now agrees with the
test data within 5%. Now, we have a validation report for the old version
of the package, but no report describing the validation of the new version.
We’d need to go back through the validation report, re-run the model for all
the validation cases and update the report.</p>
<p>When we update the validation report manually, there’s probably a pretty good
chance that some aspect of the update gets missed. Maybe it’s as simple as a
one of the model outputs doesn’t get updated in the revised validation report.
It’s also potentially rather time consuming to update this report. It would
be faster to make this validation report a Jupyter Notebook (which I’ve
<a href="https://www.kloppenborg.ca/2019/06/reproducibility/">previously talked about</a>). I haven’t yet
written about it here, but it is possible to have a Jupyter Notebook render to
a <span class="caps">PDF</span> using a corporate report format, so it’s even possible to make this
validation report look like it should
<em>(Edit: I’ve now written about this
<a href="https://www.kloppenborg.ca/2019/10/pandoc-report-templates/">here</a>)</em>.
We could also set up a test in the
package to re-run the Jupyter Notebook, and perhaps integrate it into a
continuous integration system so that the Notebook gets re-run every time a
change is made to the package. This would mean that the validation report
is always up to date.</p>
<p>When you write a Jupyter Notebook, it usually
has some code that produces a result — either a numeric result, or a graph —
and then you have some text that you’ve written which explains the result.
The problem is that this text that you’ve written doesn’t doesn’t respond to
changes in the result. Sure, there are ways of automatically updating
individual numbers inside the text that you’ve written, but sometimes the way
that the result of the code changes warrants a change in the sentiment of the
text. Maybe the text needs to change from <em>“the model shows poor agreement
with experimental results and shouldn’t be used in this case”</em> to <em>“the model
shows excellent agreement with experimental results and has been validated.”</em>
There’s no practical way that this type of update to the text could be
automated. But if the update to the result of the code in the Notebook has
been automated, there’s a good chance that the text and the results from
the code will end up disagreeing — especially if the report is more than
a few pages.</p>
<h1>The Solution</h1>
<p>So, what can be done to rectify this? We want to have the ease of having
the results of the code automatically update, but we want to make sure that
those results and the text of the report match. One approach to this
problem — and the approach that I intend to use for the adhesive joint
analysis package — is to add <code>assert</code> statements to the Notebook. This way,
if the assertion fails, the Notebook won’t automatically rebuild and our
attention will be drawn to the issue.</p>
<p>As an example, if the text says that the model is conservative, meaning that
the strain predicted by the model is higher than the strain measured by
strain gauges installed on the test articles from the validation testing, we
could write the following assert statement in the Jupyter Notebook:</p>
<div class="highlight"><pre><span></span><code><span class="k">assert</span><span class="p">(</span><span class="n">model_strain</span> <span class="o">></span> <span class="n">experimental_strain</span><span class="p">)</span>
</code></pre></div>
<p>Now, if we later make a change to the model that causes it to under-predict
strain, we’ll be alerted to this and prompted to update the validation report.</p>
<h1>Implementing the Solution</h1>
<p>To run a Jupyter Notebook from code (for example in a test suite), I’ve use
the following code in the past. This code was based on code found on
<a href="http://blog.thedataincubator.com/2016/06/testing-jupyter-notebooks/">The Data Incubator Blog</a></p>
<div class="highlight"><pre><span></span><code><span class="k">def</span> <span class="nf">_notebook_run</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">path</span><span class="p">):</span>
<span class="n">kernel_name</span> <span class="o">=</span> <span class="s2">"python</span><span class="si">{}</span><span class="s2">"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">version_info</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
<span class="n">file_dir</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">path</span><span class="o">.</span><span class="n">dirname</span><span class="p">(</span><span class="vm">__file__</span><span class="p">)</span>
<span class="n">errors</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">path</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span>
<span class="n">nb</span> <span class="o">=</span> <span class="n">nbformat</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">as_version</span><span class="o">=</span><span class="mi">4</span><span class="p">)</span>
<span class="n">nb</span><span class="o">.</span><span class="n">metadata</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">"kernelspec"</span><span class="p">,</span> <span class="p">{})[</span><span class="s2">"name"</span><span class="p">]</span> <span class="o">=</span> <span class="n">kernel_name</span>
<span class="n">ep</span> <span class="o">=</span> <span class="n">ExecutePreprocessor</span><span class="p">(</span><span class="n">kernel_name</span><span class="o">=</span><span class="n">kernel_name</span><span class="p">,</span> <span class="n">timeout</span><span class="o">=</span><span class="mi">3600</span><span class="p">)</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">ep</span><span class="o">.</span><span class="n">preprocess</span><span class="p">(</span><span class="n">nb</span><span class="p">,</span> <span class="p">{</span><span class="s2">"metadata"</span><span class="p">:</span> <span class="p">{</span><span class="s2">"path"</span><span class="p">:</span> <span class="n">file_dir</span><span class="p">}})</span>
<span class="k">except</span> <span class="n">CellExecutionError</span> <span class="k">as</span> <span class="n">e</span><span class="p">:</span>
<span class="k">if</span> <span class="s2">"SKIP"</span> <span class="ow">in</span> <span class="n">e</span><span class="o">.</span><span class="n">traceback</span><span class="p">:</span>
<span class="n">errors</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="nb">str</span><span class="p">(</span><span class="n">e</span><span class="o">.</span><span class="n">traceback</span><span class="p">))</span>
<span class="k">else</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">e</span>
<span class="k">return</span> <span class="n">nb</span><span class="p">,</span> <span class="n">errors</span>
<span class="n">_notebook_run</span><span class="p">(</span><span class="s2">"file-name-of-my-notebook.ipynb"</span><span class="p">)</span>
</code></pre></div>
<p>This code will run the Notebook <code>file-name-of-my-notebook.ipynb</code> and will
raise an error if an error is encountered. If this is inside a <code>unittest2</code>
or <code>NoseTest</code> test suite, this will cause a test failure.</p>
<h1>Conclusion</h1>
<p>Validating software used in a way that affects an aircraft design is very
important in ensuring the safety of that design. Keeping the validation report
up to date can be tedious, but can be automated using Jupyter Notebooks.
The conclusions drawn in the validation report need to match the results of the
software being validated. One approach to ensuring that this is always true
is to add <code>assert</code> statements to the Jupyter Notebook that forms the
validation report.</p>Reproducibility of Engineering Calculations2019-06-20T00:00:00-04:002019-06-20T00:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2019-06-20:/2019/06/reproducibility/<p>Reproducibility in engineering work doesn’t seem to get the attention that it
deserves. I can’t count the number of times that I’ve read an old engineering
report in search of a particular result, only to find that the calculation that
lead to that result is only barely …</p><p>Reproducibility in engineering work doesn’t seem to get the attention that it
deserves. I can’t count the number of times that I’ve read an old engineering
report in search of a particular result, only to find that the calculation that
lead to that result is only barely described, or there is just a screenshot of
an Excel workbook with a few input numbers and a final result. When I find
things like this, it makes me a little nervous: did the original author use
the correct formula when computing this result? What assumptions did the
author make and neglect to document? What approximations were made? Was the
original review of the report diligent enough to check this particular result?</p>
<p>Let’s take a hypothetical example. For simplicity, let’s assume that we’re
analyzing some sort of bracket. It’s 2 inches wide, 0.125 inches thick and
5 inches long. It’s cantilievered with a load applied 2 inches from the free
edge. We care about both the deflection and the maximum stress. The formulae
for deflection and stress are given by Roark<sup id="fnref:1"><a class="footnote-ref" href="#fn:1">1</a></sup>. We’ll adapt those
equations slightly:</p>
<p>$$
\delta_a = \frac{-P}{6 E I} (2 L^3 - 3 L^2 a + a^3) $$</p>
<p>$$
\sigma = \frac{M_B \frac{t}{2}}{I} = \frac{P (L - a) \frac{t}{2}}{I} $$</p>
<p>Given these equations and the data above, we could quite easily do the
calculation in an spreadsheet program like <span class="caps">MS</span>-Excel. But, if we want to
include our calculation in a report (most likely as a screenshot of the
spreadsheet), our report will probably just look like this:</p>
<p><img alt="excel1" src="https://www.kloppenborg.ca/2019/06/reproducibility/reproducibility_excel.png"></p>
<p>This shows the “right” answer, but if you’re reviewing the report, how do you
know that the answer is right? If you’re reviewing the report before it’s
released, you can probably get a copy of the Excel file and check the
formulae in the cells. You’ll spend a few minutes deciphering the formula to
figure out if it’s correct. But, if you’re reading the report later,
especially if you’re outside the company that wrote it, good luck. You’re
going to have to get out a pen, paper and your calculator to repeat the
calculation and figure out if it’s right. This problem is even worse if the
author of the report hard coded in a few of the input values (i.e. length,
width, elastic modulus, etc.) into the formulae.</p>
<p>There are a few ways to address this problem of reproducibility. We’ll explore
two of these ways. The first is to use software like
<a href="https://www.ptc.com/en/products/mathcad">MathCAD</a>, or it’s free
alternative <a href="https://en.smath.info/view/SMathStudio/summary">SMath-Studio</a>.
Both of these products are <a href="https://en.wikipedia.org/wiki/WYSIWYG"><span class="caps">WYSIWYG</span></a>
math editors that are unit aware. With either of these, your could do your
calculations in the MathCAD or SMath-Studio and paste a screenshot of this
into your report. </p>
<p><img alt="smath" src="https://www.kloppenborg.ca/2019/06/reproducibility/reproducibility_smath.png"></p>
<p>Now, the input data and the formula would be shown directly in the report.
The added benefit is that, since these pieces of software are unit aware, you
can’t make simple unit errors —- if you forget an exponent, the units shown
in the result won’t be what you expect, so you know that you’ve made a mistake.</p>
<p>The other way to approach this problem is to use something called a notebook.
If you’re comfortable enough to write simple code in
<a href="https://www.python.org/">Python</a>, you could use a
<a href="http://jupyter.org/">jupyter notebook</a>. If you’re doing some data analysis
or statistics, you might prefer to write some code in
<a href="https://www.r-project.org/">R</a> (though, you could use
<a href="https://pandas.pydata.org/">pandas</a> if you prefer to use Python). While you
use R with jupyter notebooks (as well as several other languages), in my
opionion R Studio’s
<a href="https://rmarkdown.rstudio.com/r_notebooks.html">R Notebooks</a> are a little
bit better to work with. If you were to do the same calculation with a
notebook (in this case, we’ll use a jupyter notebook and Python), it would
look like this:</p>
<p><img alt="notebook" src="https://www.kloppenborg.ca/2019/06/reproducibility/reproducibility_notebook.png"></p>
<p>There are a few advantages of using a notebook. First, you can use a
programming language with a little bit more power than MathCAD or
SMath-Studio — if you need to do an iterative calculation or find the root
of system of non-linear equations, you can do it with a language like
Python or R — and do so in a way that’s not too difficult for the reader to
understand. The other advantage of using a notebook is that notebooks are
intended to mix code, results and text. You could actually write your whole
report using a notebook! You could explain your approach to solving the
problem, include the code used to solve the problem and then show the results
all in the same document. No need to copy-and-paste anything and no need to
store multiple files (like a word document and a SMath-Studio file).</p>
<p>Text written in a notebook (either a jupyter notebook or an R Notebook) is
written using using something called
<a href="https://en.m.wikipedia.org/wiki/Markdown">markdown</a>. This is a
“lightweight” way of formatting text. If you want a bullet list, you just
type an asterix at the beginning of each line; if you want a heading, you
start the line with a hash symbol (or two for a sub-heading). And, most
importantly for engineering reports, you can include formulae using
<a href="https://en.m.wikibooks.org/wiki/LaTeX/Mathematics">LaTeX</a> from within
markdown just by enclosing the formula with two dollar signs before and after
it — no need to suffer through using the <span class="caps">MS</span>-Word Equation Editor.</p>
<p>If you need a corporate format for your report, there are ways to create PDFs
from either a jupyter notebook or an R Notebook using a custom format. I plan
on writting about this in a later post. Stay tuned.
<em>(Edit: I’ve written about this <a href="https://www.kloppenborg.ca/2019/10/pandoc-report-templates/">here</a>)</em></p>
<p>We’ve explored a few ways of making an engineering report more reproducible.
Neither of the solutions explored are idea for every scenario — some
scenarios are more suited to one of the solutions or the other — but both
will improve many engineering reports.</p>
<div class="footnote">
<hr>
<ol>
<li id="fn:1">
<p>W. Young and R. Budynas, Roark’s Formulas for Stress and Strain, Seventh Edition. New York: McGraw-Hill, 2002. <a class="footnote-backref" href="#fnref:1" title="Jump back to footnote 1 in the text">↩</a></p>
</li>
</ol>
</div>rde: Now on CRAN2018-07-09T00:00:00-04:002018-07-09T00:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2018-07-09:/2018/07/rde/<p>For the last couple of years, we’ve been using the statistical programming
language <a href="https://www.r-project.org">R</a> when we do statistical analysis or
data visualizations at work. We typically deal with <em>small data</em> — most of
the time, our data sets are high-tens or low-hundreds of rows of data.</p>
<p>A lot of the …</p><p>For the last couple of years, we’ve been using the statistical programming
language <a href="https://www.r-project.org">R</a> when we do statistical analysis or
data visualizations at work. We typically deal with <em>small data</em> — most of
the time, our data sets are high-tens or low-hundreds of rows of data.</p>
<p>A lot of the time, we create <a href="https://rmarkdown.rstudio.com/r_notebooks.html">R Notebooks</a>
with our analysis and visualizations. This works well for us: the R Notebook
contains the code used to do the analysis, the results of the analysis and
the visualizations, all in one place. This eliminates questions like: “did you
remove outliers before making the graph?” Or, “did you check that the data are
distributed normally before you did that test?” A reviewer of the R Notebook can
see exactly what was done.</p>
<p>By default, the R Notebook produces an html file that you can open in your
browser. You can email this html file
to a colleague, and they can see your results and graphs, as well as exactly
how you obtained them. If you made a logical mistake, or an inappropriate
assumption, your colleague has the opportunity to find it.</p>
<p>There is also a button in the html file that the R Notebook gets exported to
that says “Download Rmd.” This allows your colleague to open the notebook in
<a href="https://www.rstudio.com/">R Studio</a> and run your code. <em>If you sent your data.</em></p>
<p>The one problem with just emailing R Notebooks to a colleague is that the R
Notebook does not include the data. This might be okay if the data source is a
file on a network, or a database
that you both have access to, but in a lot of cases — at least in my work
— the data is a <span class="caps">CSV</span> or Excel file. Now, if I want to send an R Notebook
to a colleague to review, I need to remember to send the data file along with it.</p>
<p>Enter <code>rde</code>.</p>
<p>I wrote the package <a href="https://cran.r-project.org/web/packages/rde/"><code>rde</code></a>
(which stands for Reproducible Data Embedding) to tackle this problem. This
package allows you to embed data right in your R Notebook (or any other R code).
It does so by compressing the data and then
<a href="https://en.wikipedia.org/wiki/Base64">base-64 encoding</a> it into an <span class="caps">ASCII</span>
string. This string can be pasted into the R Notebook and converted back into
the original data when someone re-runs the Notebook.</p>
<p>I won’t go into all the details of how to use the package. If you’d like to
learn more, you can read the package <a href="https://cran.r-project.org/web/packages/rde/vignettes/rde_tutorial.html">vignette</a>.</p>
<p>This isn’t the first R Package that I’ve written, but it is the first one that
I’ve submitted to <a href="https://cran.r-project.org/"><span class="caps">CRAN</span></a>. When you install an R
package using <code>install.packages()</code>, you’re installing it from <span class="caps">CRAN</span>. I think that
<span class="caps">CRAN</span> is one of the best parts of the R ecosystem since it does
<a href="https://en.wikipedia.org/wiki/Continuous_integration">continuous integration</a>
for all of the packages hosted there. This helps ensure that all the packages
continue to work as R is updated and as other packages are updated. I’ll likely
talk about this more in a future blog post.</p>
<p>If you’re an R user and you think that the package <code>rde</code> would help you in your
workflow, check it out. You can install it by typing <code>install.packages("rde")</code>
in R. If you find a bug, please file an
<a href="https://github.com/kloppen/rde/issues">issue on GitHub</a>. And, if you would
like to add functionality or improve it in some way, feel free to send me a
pull request.</p>Welcome to Kloppenborg.ca2018-06-27T22:00:00-04:002018-06-27T22:00:00-04:00Stefan Kloppenborgtag:www.kloppenborg.ca,2018-06-27:/2018/06/welcome/<p>Welcome to kloppenborg.ca</p>
<p>I plan to use this website as a blog where I discuss topics related to
engineering, technology and whatever else I’m thinking about at the time.</p>
<p>If you find any of the posts here interesting, feel free to share them. If you
don’t feel …</p><p>Welcome to kloppenborg.ca</p>
<p>I plan to use this website as a blog where I discuss topics related to
engineering, technology and whatever else I’m thinking about at the time.</p>
<p>If you find any of the posts here interesting, feel free to share them. If you
don’t feel free to ignore them.</p>