28 Comments

Dueling Wordprint Studies

This is the 3rd post reviewing By the Hand of Mormon, by Terryl Givens.  I’ve taken a bit on an interest in wordprint studies.  Givens explains wordprint studies on page 156.

Computational stylistics is based on the premise that all authors exhibit subtle, quantifiable stylistic traits that are equivalent to a literary fingerprint, or wordprint.  The method has been used to investigate other instances of disputed authorship, from Plato to Shakespeare to the Federalist papers.  Analyzing blocks of words from 24 of the Book of Mormon’s ostensible authors, along with nine nineteenth-century writers including Joseph Smith, three statisticians used three statistical techniques (multivariate analysis of variance, cluster analysis, and discriminant analysis) to establish the probability that the various parts of the Book of Mormon were composed by the range of authors suggested by the narrative itself.  They found that all of the sample word blocks exhibit their own “discernible authorship styles (wordprints),” even though these blocks are not clearly demarcated in the text, but are “shuffled and intermixed” throughout the Book of Mormon’s editorially complex narrative structure (wherein alleged authorship shifts some 2.000 times).  Emphasizing the demonstrated resistance of these methods to even deliberate stylistic imitation, they further conclude that “it does not seem possible that Joseph Smith or any other writer could have fabricated a work with 24 or more discernible authorship styles.”  The evidence, they write, is “overwhelming” that the Book of Mormon was not written by Joseph Smith or any of his contemporaries or alleged collaborators they tested for (including Sidney Rigdon and Solomon Spaulding).4 A subsequent, even more sophisticated analysis by a Berkeley group concluded that it is “statistically indefensible to propose Joseph Smith or Oliver Cowdery or Solomon Spaulding as the author of 30,000 words…attributed to Nephi and Alma…The Book of Mormon measures multiauthored, with authorship consistent with its own internal claims.  These results are obtained even though the writings of Nephi and Alma were ‘translated’ by Joseph Smith.”5

Ok, let me talk about multivariate analysis of variance, cluster analysis, and discriminant analysis.  These are very advanced graduate level statistical techniques.  Ronald Fisher is a famous English statistician (ok, only famous to statisticians) who pioneered many of these techniques.  Danish Professor Anders Hald said Fisher  “almost single-handedly created the foundations for modern statistical science.”  Fisher died in 1962.  These techniques are really new, are frankly aren’t discussed in any bachelor’s level statistics courses.

Givens book was published in 2002.  From reading this paragraph, one would think wordprint studies are solidly in favor of Mormons.  However, in Dec 2008, Oxford Journals published a new study called “Reassessing authorship of the Book of Mormon using delta and nearest shrunken centroid classification.” I have a master’s degree in statistics, and until I saw this article, I had never heard of a shrunken centroid classification.  I must say I have always been impressed with Wikipedia when it comes to math articles, but Wikipedia doesn’t even have an article on this shrunken centroid classification.  I found this Stanford University article that describes the technique.  Apparently it is used in cancer gene analysis.  The authors of this Book of Mormon authorship article are three Stanford University professors:  Matthew L. Jockers (English), Daniela M. Witten  (Statistics), Craig S. Criddle (Civil and Environmental Engineering).  They claim that “Our findings support the hypothesis that Rigdon was the main architect of the Book of Mormon and are consistent with historical evidence suggesting that he fabricated the book by adding theology to the unpublished writings of Spalding (then deceased).”

(The abstract is found here, but you have to pay $28 to actually view the article.)  FAIR has criticized the methodology of the study, because they didn’t include Joseph Smith as a possible author.  Why isn’t he as likely as Spalding to have written it?  It appears the Stanford professors decided that the true author of the Book of Mormon was one of only seven possible authors:  Oliver Cowdery, Parley P Pratt, Sidney Rigdon, Solomon Spalding, Isaiah/Malachi, Joel Barlow, and Henry Longfellow.  Barlow and Longfellow are poets thrown in as control, so it shouldn’t be a surprise that they didn’t match.  Since the Book of Mormon includes writings of Isaiah and Malachi, these portions should easily match, and the Jockers study concludes these portions match.

I guess my biggest problem with Jockers is this.  Quoting from the corrected abstract, “With the corrected data, NSC ranked Rigdon at 0.4626 and Spalding at 0.46525.”  If I am understanding this correctly, these numbers are probabilities.  So the probability that Sidney Rigdon is the real author if the Book of Mormon is less 50%–not exactly a ringing endorsement, I’d say.

Now, to be fair, I don’t have probabilities that Givens is referencing–perhaps they are suspect as well.  But I expect that Isaiah and Malachi have much higher probabilities than 0.4626 for Jockers study.  So, what do you think of wordprint studies?

28 comments on “Dueling Wordprint Studies

  1. MH:

    Any post about advanced statistical techniques is a fun read for me. Thanks.

    When Dale Broadhurst, if memory serves, showed up to talk about the Spaulding theory a few months ago, I was hoping to learn more about the nitty-gritty of these claims, but he wasn’t far enough along in his own work that I could run any checks.

    So I’d like to see more of the wordprint and literary analusis work done, but clearly, testing JS himself is a pretty glaring methodological omission. One might also compare the BofM to D&C sections where we have more reason to believe Rigdon was directly involved in the experience: “and now, after the many testimonies that have been given of him, this is the testimony laast of all that WE give of Him”.

  2. I enjoy the science of these types of analysis and think they have some bearing, but I have to ask the question, if Rigdon was the author, what was his motive for hiding his authorship and attributing it to Joseph Smith? It would have seemed his best argument after Joseph’s death to lead the church as he so desperately wanted to. And yet he never stated that he wrote it. I’ve looked at a lot of these types of studies and have found most week when analyzing the Book of Mormon for alternative authors such as Rigdon, Spaulding or Cowdery, as you pointed out here. The best explanation for the Book of Mormon is still the one that Joseph testified happened. It was translated by the gift and power of God.

  3. david, I asked the same question on my previous post on the Spaulding manuscript, and I never got a satisfactory answer. even Dale Broadhurst stopped by and his answer seemed inadequate. it is all conspiracy theory that rigdon needed joseph as a front man, but rigdon somehow under estimated joseph’s leadership abilities. even after joseph’s death, one would think rigdon could have claimed credit for partially translating. the d&c even shows oliver tried to translate and failed, seemingly giving rigdon a perfect opportunity to slip in the spaulding manuscript. this whole conspiracy makes no sense to me.

  4. Having only read the abstract, you write: “I guess my biggest problem with Jockers is this. Quoting from the corrected abstract, “With the corrected data, NSC ranked Rigdon at 0.4626 and Spalding at 0.46525.” If I am understanding this correctly. . . ”

    You are *not* understanding it correctly. That is the figure for just one chapter for which the attribution changed after a correction to the data file. The differences in probabilities for other chapters were significantly greater.

    I’d strongly encourage you to read the entire paper. Most libraries can order in a copy via interlibrary loan if you don’t want to cough up the $28 bucks.

    The reasons for excluding Smith are also spelled out carefully in the essay.

  5. Duke,

    Thanks for the corrections–I didn’t realize that was only one chapter. I will see if I can get my hands on a copy.

    I know they said they couldn’t find “original” samples of Smith’s writings, yet BYU studies did, and the Joseph Smith papers should have some copies of writings in Joseph’s hand. It is irresponsible to publish an article stating that Sidney Rigdon is the author, when they in fact left out THE most significant possibility. Yes they did include reasons for excluding Smith, but that is a GLARING weakness, IMO.

  6. Duke:

    You apparently already have access to the paper. What were the probabilities of authorship (or correlation coefficients) which you saw published? I ask because authors normally state their most compelling statistical evidence for your main conclusion in an abstract — not some minor correction for a chapter.

    As stated, what you have is non-random involvement compared to a control group of poets, not non-random involvement compared with the purported author. In fact, it would be interesting to see if there is correlation of multiple authors with the multiple authorship wordprint studies — which is where I think Dale Broadhurst was trying to go. If, as you suggest, there are radically different outcomes in the published statistics for the different chapters analyzed, do modern author breaks match author breaks within the text itself in a consistent fashion?

  7. Summarizing a complex analysis is more than I can take on here, but one important distinction to make is that the “probabilities” discussed are not probabilities that x was the author of chapter n. Rather, these are probabilities of one candidate author vs another (within the closed set of candidates). In other words, the probabilities tell us who was the most likely author from among the authors tested. It’s entirely possible that some other author who was not in the closed set, Smith perhaps (or God!), was the true author of a given chapter.

    Jockers et. al. base their decision to exclude Smith on the work of Mormon scholar Dean Jessee. They cite Jessee’s edited collection of Smith writings and quote sections from Jessee that show that even Jessee is not convinced that the writings attributed to Smith can in fact be fairly attributed to Smith. Yes, there are a few docs in Smith’s hand, but according to Jockers et. al. not enough to constitute a good sample.

  8. Got it, but that makes the entire exercise pointless, don’t you think? Telling me that Rigdon is far more likely than Longfellow to have written the Book of Mormon is like telling me I’m far more likely to be killed by a bear than a shark. I’m far more likely to die of a heart attack or kidney failure, but if you haven’t actually seen my medical records…

  9. I have to agree with FireTag here. It’s like putting together a police lineup without the guy who committed the crime. The eyewitness chooses someone, and then we assume they’re guilty because the witness picked them? There’s something seriously flawed here. I’m sorry Jockers didn’t feel anything was reliably Joseph’s handwriting, but that is a GLARING problem. It also seems to me that the BYU studies article had more authors to choose from, so I’m not sure why Jockers limited his sample size. Can you tell me the probability for the Isaiah and Malachi chapters? I’ll have to see if I can get an inter-library loan to check out the article in more detail.

    Duke, Dale Broadhurst and Craig Criddle stopped by here a few months ago. I did Part 1 and Part 2 of the Spaulding Manuscript. Dale seemed to think that Rigdon wrote some chapters, Cowdery wrote some, and Spaulding wrote some in a sort of collaborative Book of Mormon. (I’m not sure why Joseph was a puppet in this whole process, but that’s another question….) Anyway, do you know if Dale’s work is similar to Jockers?

  10. Well, at least the guy who’s already confessed to the crime ought to be in the lineup. 😀

  11. Regarding which candidate authors were selected: my sense is that these specific candidates were selected because these were individuals who had previously been posed as authors (e.g. by Howe, the Conneaut witnesses, etc). In other words, the entire Jockers study was constructed to test the existing hypothesis that Rigdon and/or Spalding were contributors; they didn’t just pull the names out of a hat.

    So, to use your analogy, assume first that you are in a closed room with a bear and a shark. Now predict which beast you are more likely to be killed by. That’s not pointless at all. Again, I think it would be best if you read the whole article. Most all of this is spelled out. I wish I could just post the darn thing, but it’s a copyright issue. . .

  12. not quite duke- the bear (Joseph) is missing from the room. and the shark is swimming in a tank. unless you’re stupid enough to stick your hand in the tank, the shark (rigdon) can’t kill you. that’s the problem with the study.

  13. after reviewing firetag’s medical records, we discover he had a bad ticker, and was so scared of sharks that he had a heart attack and died. so, the shark may have indirectly influenced his death, but it was really his weak heart that killed him. that’s the problem with the study. it’s pointing to the shark.

  14. Contrary to what most people say, the most dangerous animal in the world is
    not the lion or the tiger or even the elephant. It’s a shark riding on an
    elephant’s back, just trampling and eating everything they see.” – Jack Handey

  15. I cross posted this at Mormon Matters if anyone wants to follow the discussion there.

    See http://mormonmatters.org/2010/03/06/dueling-wordprint-studies/

  16. I’m currently studying statistics, and am familiar with the Jockers/Witten/Criddle study. It’s quite frankly riddled with holes. Some BYU Statistics professors have coauthored a paper with a couple of others refuting Jockers et al. and submitted it to the same journal, but with the slow process of peer review it will be most likely be some time before it is published (if the journal accepts it at all). I’ll try to give you my best understanding of it below. I’ve read the preprint, but don’t have explicit permission to talk about it, so I’ll keep this anonymous.

    NSC is a relatively new technique developed in large part by Rob Tibshirani, Witten’s doctoral advisor at Stanford (note she’s a doctoral student, not a professor), so I assume she brought that to the table; Jockers specializes in computer textual analysis, and Criddle brings the anti-Mormon motivation for the paper. It’s a formidable combination.

    First and foremost, the technique assumes that the author must be one of the seven candidate authors–it’s a case of “out of these seven, who is most likely to have written it?”, so there’s no possibility of concluding “none of the above.” The BYU study introduces a “latent author” option, which gives the “none of the above” option. Essentially, it’s a dummy author whose writing style is just inconsistent with the observed style, so that if one of the candidate authors’ styles is consistent with the observed style that author will be chosen, but if none of the candidate authors’ styles are consistent with the observed style, the latent author will be chosen.

    Jockers et al. make a big deal out of large differences in posterior probability, where one author has a probability of over 90% and the others have very low probabilities. The BYU study shows that when all the authors’ styles are inconsistent with the observed style and a latent author is not included (i.e., the method used in the Jockers study), the author whose style is _least_ inconsistent will have an inflated probability in this manner. The BYU paper gives an example where the Jockers method, given the same candidate authors as for the Book of Mormon, attributes over half of Alexander Hamilton’s Federalist papers to Sidney Rigdon, simply because it has nobody better to attribute them to.

    The revised study with a latent author attributes authorship to the latent author for almost three quarters of the chapters in the Book of Mormon, which suggests that the true authors (Mormon et al, if you like) are not among the candidate authors. For the Federalist papers example, it attributes only two of the 51 papers to Rigdon and all the rest (out of 50 or so, if I recall correctly) to the latent author (i.e., Hamilton).

    I think the jury is still out on wordprint studies, but Jockers et al. really isn’t very statistically sound. There are other problems in addition to the one I’ve outlined, but this is already very long…

  17. Also: I skimmed through the discussion at Mormon Matters, but it’s far too active for me to participate in. Somebody asked whether anybody has done any investigation into the affects of translation on wordprint. I’m told that a BYU student working with one of the statistics professors has undertaken such a study, but I don’t know anything about what results they may have found. Perhaps I can find something out.

    Should you want to reference or copy my comments here on Mormon Matters, feel free. I’ll try to participate here.

  18. AR:

    Thanks for this. The “latent author” technique will have its own methodological issues, (e.g., how does one get a control group of latent authors — statesmen work better than poets, it seems) but the issue is now statistically joined.

    Besides, we now understand the modern conservative Mormon bent toward the Founders. Hamilton was in on the conspiracy. 😀

  19. Reader, thanks for the update on BYU Wordprint studies. The comments at Mormon Matters are routinely closed after a month or so in order to solve a big SPAM problem over there, but I welcome your updates over here. When the BYU study comes out, I’ll definitely want to read it and post an update.

    I think this idea of a latent author is an important concept, but as I mentioned over at Mormon Matters, I question how accurately any Wordprint study can identify a candidate author, and I think this Federalist Papers exercise is a perfect example. Including a latent author (such as Mormon) may help in reducing false positives such as Rigdon with the Federalist Papers, but I think Stanley Fish and Roland Barthes might be on to something.

    I don’t believe that a Wordprint is anywhere close as reliable to a fingerprint–especially not as reliable as DNA evidence. I’d say a Wordprint may be close to the idea of a hair sample, or a shoe print. For example, a hair sample might be able to show the perpetrator of a crime is white, black, or asian, but it’s no where close to being able to say “defendent x” did the crime. Now perhaps over time, techniques can be improved, but currently, I can’t see wordprint studies as any more reliable than a hair sample left at a crime scene. (But it is an interesting statistical technique.)

    I have to say that I am grateful to have been able to come across the Jockers Study right after posting at Mormon Matters. It definitely helped greatly with that discussion. I’d love to come across any other Wordprint studies, especially in relation to the Book of Mormon.

  20. From reading all of the above, I came away with these thoughts:

    1. A brand new statistical technique, which may not be well understood is involved here. Because it is not all that well understood, it could easily be misapplied.
    2. There has been, as of now, no scholarly response to the original article because not enough time has passed.
    3. The paper does not appear to spend any time looking at possible differences between reported internal authors.
    4. The original paper seems to be deeply flawed in failing to examine either the idea that Joseph Smith was the author or that “none of the above” was the author. Reasons for excluding Joseph Smith were given but were exceptionally weak. So weak that one must really suspect some bias.
    5. At least one of the authors of the paper had an agenda beyond ordinary scholarship.
    6. Rigdon had a style that statistically seems to look like other people’s writing at times (Hamilton for example).
    7. Wordprint analysis may not be quite as strong as DNA or Fingerprinting but it pretty darned good.

  21. Charles, thanks for stopping by. This is a fun topic for me.

    I agree with 1, 2, 6, 7. Let me address some of your other points. On 3, I don’t understand what differences you seem to be implying. Could you clarify?

    On 4, after reading the paper, I empathize with the authors a bit regarding Joseph Smith. They couldn’t find anything they considered reliable in his handwriting, so that is why they excluded him. From what I understand, they are trying to run this study again with Joseph Smith and some other candidate authors (such as Ethan Smith.) Criddle seems enthusiastic about his results, but I am not aware that they have been published. Still, the exclusion of Smith is a big problem.

    On 5, Criddle did seem to have a bit of an agenda as a former Mormon. But, he did get the paper published, so some reviewers thought it was legitimate. I guess I don’t want to condemn his motives too much, but I do think they should be noted. Even if someone might be biased, if he does good science, it’s still good science.

    We shouldn’t condemn the study simply because Criddle believes the BoM was written by Rigdon, just as we shouldn’t be too quick to praise BYU with a confirmatory study showing Mormon as the real author. The merits of the science in the papers are fair game to be analyzed, and we should note author biases. If their conclusions are legit, then it should be taken seriously.

  22. I know this thread has gone stale, but Schaalje and friends from BYU have published a retort to Jockers et al. original study in the same journal. Their findings were basically opposite those of Jockers et al. The premise of the study is that Jockers et al used a closed NSC (nearest shrunken centroid) to address the authorship of the Book of Mormon whereas the BYU folks used an open NSC. The primary difference is that the closed NSC does not take into account that the author might not be any of the chosen test cases (someone other than Spaulding, Rigdon, Cowdrey, Smith, or Pratt) whereas the open NSC allows for a “latent” (or none of the above) author to account for the authorship. While the latter seems like a more reasonable approach to the question. It always seems suspicious that the BYU folks are the only ones who authored this paper. I see no reason why (if their study is valid) Schaajle et al. couldn’t have found some non-Mormon colleagues to jump on board. Nevertheless and notwithstanding this is turning out to be an interesting dialogue.

    You can find the Schaajle et al. paper here:
    http://llc.oxfordjournals.org/content/early/2011/01/18/llc.fqq029.abstract

  23. Chris, I really appreciate you bringing this to my attention. Thank you very much! Perhaps I will do a follow up post after I have fully digested the article! (I’m pleased to see that I was able to view the PDF.)

  24. […] really appreciate a comment by Chris Spencer on my previous post Dueling Wordprint Studies.  In that post, I had discussed a controversial […]

  25. […] prove the Spaulding theory, but I think has been thoroughly debunked.  I also think the original Jokkers wordprint study was successfully refuted by […]

  26. […] prove the Spaulding theory, but I think has been thoroughly debunked.  I also think the original Jokkers wordprint study was successfully refuted by […]

  27. […] there’s been a lot of wordprint studies trying to identify [a specific author.] There was the Stanford study that said, “Oh, see, Solomon Spalding was the real author.”  Whereas, then BYU guys […]

  28. […] there’s been a lot of wordprint studies trying to identify [a specific author.] There was the Stanford study that said, “Oh, see, Solomon Spalding was the real author.”  Whereas, then BYU guys used the […]

Leave a comment