I really appreciate a comment by Chris Spencer on my previous post Dueling Wordprint Studies. In that post, I had discussed a controversial study completed by Stanford researchers Mathew Jockers, Daniela Witten, and Craig Criddle who concluded that 57% the Book of Mormon was authored by Sidney Rigdon and Solomon Spaulding. (There was an interesting discussion at Mormon Matters as well.) Part of the reason they had Rigdon and Spaulding as candidate authors was due to the Spaulding Theory. Here’s a bit of background.
Ever since the Book of Mormon was published in 1830, critics have tried to show that it came forth as the result of fraud. One of the earliest theories was the Spaulding Theory. As the theory goes, Solomon Spaulding wrote an unpublished novel about a group of Romans from the time of Constantine that were blown off course from Britain to the Americas. Somehow (never adequately explained) Sidney Rigdon obtained the manuscript, and then transferred it surreptitiously to Joseph Smith who added religious information. Fawn Brodie put together an appendix in her book No Man Knows My History outlining problems with the theory. (I wrote about this in a post called Debunking the Spaulding Theory.) Most people think the theory has been debunked, though the theory still has some adherents, such as Dale Broadhurst who maintains a website in favor of the theory.
Wordprint studies try to determine the true author of text. The idea of a wordprint is similar to a finger print. Each person uses a certain set of words such as “a, but, and, the, etc” in a way that is unique. By collecting information on word usage, a wordprint theoretically can identify an author.
In 2008, Mathew Jockers, Daniela Witten, and Craig Criddle of Stanford University created a stir when they produced a peer-reviewed article in Oxford’s journal titled Literary and Linguistic Computing. The authors concluded that major portions of the Book of Mormon exhibited Sidney Rigdon and Solomon Spaulding’s writing style, thus creating a resurgence of interest in the Spaulding Theory. Traditionally, wordprint studies have used a statistical technique known as the Delta Method. Jockers, et al compared the Delta method to a new technique called Nearest Shrunken Centroid (NSC). NSC has been used cancer studies, but this was the first time it has been used in wordprint studies. The Jockers study found the NSC method to be much more reliable than the Delta method. Many New Order Mormons and anti-Mormons were pleased with the study. But there were some big questions about the method.
In January 2011 Bruce Schaalje, Paul Fields, and Matthew Roper of BYU, along with Gregory Snow of Intermountain Health Care released a study outlining problems with the Jockers study in the same Oxford journal of Literary and Lingustic Computing. While acknowledging that NSC is a good method for wordprint studies, they detailed several problems with the Jockers study, noting a “naive application of NSC methodology” led to “misleading results.” Jockers et al had used a closed set of 7 authors for their study. Schaalje’s study showed that an open set of candidate authors “produced dramatically different results from a closed-set NSC analysis.”
The beginning of the Schaalje article discusses a bit of mathematical theory (I’ll spare you.) Schaalje notes that this study has a foundation in theory, rather than emperical evidence like the Jocker study; therefore Schaalje’s study is a bit stronger. Schaalje was able to reproduce Jocker’s results, and applied the same technique to another document: the Federalist Papers. To demonstrate a problem with Jocker’s technique, they purposely excluded Alexander Hamilton from the list of candidate authors, and picked other authors to see which author the Jocker’s closed-set method would choose. The candidate authors were Joseph Smith, early Sidney Rigdon, late Sidney Rigdon, Solomon Spaulding, Oliver Cowdery, and Parley P. Pratt.
Early or late Rigdon was falsely chosen as the author of 28 of the 51 Hamilton texts with inflated posterior probabilities ranging as high as 0.9999 (Fig. 2). Pratt was falsely chosen as the author of 12 of the papers, and Cowdery was falsely chosen as the author of the remaining 11 papers. These results dramatically demonstrate the danger of misapplying closed-set NSC.
Schaalje et al noted that Jocker’s et al should have used a “goodness of fit” test to verify how well their findings matched, and proposed a method to compute the goodness of fit.
An important extension to NSC classification is to allow an open set, i.e. the possibility that the test texts might not be authored by any of the candidate authors. We propose that this can be done by positing an unobserved author for each test text in addition to the observed candidates in the training data. We propose an unobserved author with a distribution of literary features just barely consistent with the test text. Thus, as a straightforward extension of the NSC classification model, we suggest that posterior probabilities for the candidate authors be calculated as…
(Once again, I’ll spare the mathematical proof.) They applied this goodness of fit test and said,
Applying this extended model to the Hamilton texts with Smith, early Rigdon, late Rigdon, Spalding, Cowdery, and Pratt as training authors, only 2 of the test texts were assigned to early or late Rigdon, while the remaining 49 were assigned to an unobserved author (obviously Hamilton) (Fig. 5).
As a further test of the open-set NSC procedure, in addition to Rigdon, etc., we included Hamilton as a training author represented by the first 25 Hamilton papers. We classified the remaining 26 Hamilton papers as test texts. We first used the closed-set model. All 26 Hamilton test texts were correctly assigned to Hamilton; none was assigned to an unobserved author. The goodness-of-fit procedure (Fig. 4, right panel) indicated that the closed-set model was valid. We then used the open-set model. All 26 Hamilton test texts were still correctly assigned to Hamilton. Hence, when the actual author was included in the training set, the allowance for an unobserved author as in Equation (10) did not appear to compromise the ability of open-set NSC to correctly attribute authorship. It is important to note that the open-set NSC procedure does not indicate how many unobserved authors there are. All we know is that if an unobserved author is selected for a test text, one unobserved author is most probable as the author of that text. There could be as many unobserved authors as the number of test texts, or as few as one. A clustering procedure would provide some information as to the total number of unobserved authors.
In order for the NSC method to work, writing samples of test authors are needed. Schaalje et al noted that the Jockers study has sample texts ranging from 114 to 17,979 words, with training texts ranging from 95 to 3752 words. With such a wide disparity of sample texts, the BYU authors indicated that was another problem.
The measurement of 100 or more word frequencies on texts of less than 100 words is almost sure to produce unreliable measurements (Holmes and Kardos,2003). For the delta procedure, Burrows (2003, p. 21) found that ‘with texts of fewer than two thousand words in length… the test gradually becomes less effective’. Others have worked with texts of 1,000, 5,000, and 10,000 words (Larsen et al.,1980; Hilton, 1990; Holmes, 1992).
To test whether the size of the training text matters, the BYU authors used 8 Rigdon samples ranging in length from 100 to 5,000 words.
we recommend in general that the training data involve only texts of at least 1,000 words because feature-specific variances do not change greatly with text size beyond 1,000. Within limits, the problem of training text size variation can be dealt with simply by compositing shorter texts of known authorship to create training texts of at least 1,000â€“2,000 words. Hoover (2004), in fact, found that combining several texts ‘helps to improve accuracy’ of authorship attribution.
After Sidney Rigdon left the church, he started his own church in Pennsylvania. (I blogged about this group previously here and here.) As noted in the 2nd link, Sidney Rigdon had many revelations between 1863 and 1873. Schaalje used these revelations to see if any false positives could be attributed to another author using various size sample texts from Rigdon.
To illustrate the effects of both extensions (Equations 10 and 12) to the NSC method, we applied the closed-set NSC method and the two extensions to 95 ‘revelations’ attributed to Sidney and Phebe Rigdon between 1863 and 1873, decades after Rigdon had left the Mormon movement. The test texts ranged in size from 60 to 4,128 words. We used the Smith, Cowdery, Spalding, Pratt, and early Rigdon texts as the training data, and specified informative priors based on the fact that Smith, Cowdery, Spalding, and Pratt had all died long before 1863. The closed-set NSC model attributed the texts mainly to Rigdon and Smith (Fig. 7), the open-set NSC model attributed most of the texts to latent authors, and the fully expanded NSC model attributed the texts to Rigdon, Smith, and latent authors. The point here is that open-set NSC without adjustments for test texts sizes is inadequate if some of the test texts are very small.
In the Jockers study, they noted some false positives. For example, Longfellow (one of the 2 control authors) was listed as an author of the Isaiah-Malachi chapters in the Book of Mormon. Jockers noted the problem, but did not investigate further, feeling confident that Isaiah-Malachi was correct in 20 of 21 chapters. Schaalje looked further into this “false positive” problem.
A disturbing feature of classification analysis when the set of test texts is large is that test texts on the stylistic fringe of the distribution for the true author can occur by chance, and may therefore ‘stray’ into the distribution of a nearby author. This explains why 2 of the 51 Hamilton texts were assigned to Rigdon (Fig. 5), and might partially explain why 21 of the 95 late Rigdon texts classified strongly as writings of Smith (Fig. 7) even though Smith had died 20 years earlier. Historians who study this period would be hard-pressed to imagine any way that Rigdon could have retained otherwise unknown Smith texts.
The same problem was observed by Hoover (2004) with regard to the delta method. He noted (Hoover, 2004, p. 460) that for particular sets of authors and texts, ‘false attributions are a serious possibility’. Burrows (2002, p. 281) similarly cautioned that the ‘the system for distinguishing between insiders and outsiders is not foolproof’ because of its dependence on probabilities rather than absolutes.
This problem, which is exacerbated by heterogeneity in text sizes, is an example of the multiplicity or multiple comparisons problem in statistics (Benjamini and Hochberg 1995). One not completely satisfactory solution would be to composite all of the test texts into one or a few large texts, and then classify those texts. We combined the Sidney texts into two large texts, combined the joint Phebeâ€“Sidney texts into one large text, and combined the Phebe texts into one text. Assigning realistic prior probabilities, the first Sidney text was classified to an unobserved author, the second Sidney text was assigned to Cowdery, the joint text was assigned to Cowdery, and the Phebe text was assigned to an unobserved author. These results indicate, at a minimum, that the authorship style of the late Rigdon texts was different from that of Rigdon’s earlier writings. This may be due to genre differences, the passage of time, or the interposition of editors. In any case, the cause of the difference is not germane to this study.
Schaalje doesn’t have a solution to the problem of false positives, but is continuing to study an idea to deal with the problem of unequally sized sequential texts. Finally, Schaalje concluded with a very different conclusion from Jockers.
Using closed-set NSC, Jockers et al. (2008) attributed 37% of the chapters to Rigdon, 28% to Isaiah/Malachi, 20% to Spalding, 9% to Cowdery, 5% to Pratt, and 1% to Longfellow. In contrast, using open-set NSC, we conclude that 73% of the chapters cannot be reliably attributed to any of the candidate authors. We first note that Jockers et al. (2008) bolstered their NSC attributions by claiming close agreement between attribution results due to Burrows’ delta and those due to closed-set NSC. That these stylistic measures would nominally agree well numerically is not surprising because Burrows’ delta stylistic distance is closely related to the quadratic delta stylistic distance (Argamon, 2008) upon which NSC is based.
However, there actually is strong disagreement between the closed-set NSC results and the delta results. This is because delta-z scores should not be taken seriously unless they are very small (i.e. very negative). Burrows (2003) found that a threshold of 1.9 separated most false positives from true attributions for a set of 17th-century poets. Jockers et al. (2008) failed to do this. In the Jockers et al. (2008) study, only 16 of the 239 chapters had delta-z values as small as 1.9 (Fig. 9). Ten of these 16 chapters were essentially verbatim quotations of Isaiah/Malachi, and all 10 were correctly attributed to Isaiah/Malachi. Four additional chapters were attributed to Isaiah/Malachi and the others to Rigdon and Spalding. The remaining 223 chapters had large delta-z values and were thus apparently false positive. Hence, the delta results of Jockers et al. (2008) actually say little more than what is already uncontroversial about Book of Mormon authorship: that some of the chapters are quotations of Isaiah and Malachi. The delta-z results do not, in fact, attribute sizeable percentages of the chapters to Rigdon, Spalding, or Cowdery.
I have to say that the BYU guys really thought through this problem well. Jockers has plans for an updated study to include Joseph Smith, and other others. Judging from the BYU study, I think the Stanford folks have some serious problems. What are your thoughts?