A Possible Alternative to Secretive DNA Analysis

[January 1, 2018: This post was updated to clarify facts regarding STRMix's defense access policy and the 2015 coding error that resulted in the review of several dozen cases in Australia.]

In October, MFIA worked with ProPublica to lift a protective order on the source code for the New York City Office of the Chief Medical Examiner’s (OCME) “Forensic Statistical Tool” (FST). The FST was used in DNA analysis in criminal investigations to calculate a likelihood ratio, which measures whether it is more likely than not that a known suspect contributed DNA to a given sample versus an unknown subject.

Major forensic DNA experts have questioned the validity of the FST. For example, Bruce Budowle, one of the architects of the FBI’s DNA database, called the OCME method “not defensible.”

Thankfully, MFIA and ProPublica won, and the source code is now public.[1] Programmers can now independently review it for errors.[2]  The case, however, raises a question of immense significance: should DNA tools like FST continue to be kept secret?

DNA forensic analysis programs, such as the FST, exist to create evidence for criminal trials. And, in the United States, there is a long tradition of open criminal trials, including to the evidence introduced at trial. In the words of the United States Supreme Court, “a presumption of openness inheres in the very nature of a criminal trial under this Nation's system of justice.” Richmond Newspapers, Inc. v. Virginia, 448 U.S. 555, 556 (1980).

So why is it an industry standard that DNA analysis tools be kept secret?

It seems that the answer to why the source code for these programs is hidden is just that this is a commercial endeavor and there are business interests at play.  In a letter to the court OCME asked the court to keep the FST source code secret to protect its economic value: “Over the course of four years, the OCME spent thousands of hours and made extensive investments in training, use of reagents, laboratory supplies, consumables, equipment, and workspace to create, develop, validate, and implement the program. While the City has not yet chosen to monetize this investment, the program remains a valuable resource that could still have significant monetary value in the private sector.”

The sourcecode for STRmix, the software that OCME has recently adopted in place of its in-house FST, is not publicly available.  Rather, defense counsel can obtain access only if they pay for training or purchase the software and assent to a confidentiality agreement preventing public disclosure.  STRMix says that “[the] software is best tested by examining the Extended Output for the compiled STRmix software, rather than the source code.” Other DNA analysis tools like TrueAllele are similarly shrouded.

Courts have upheld this secrecy, accepting corporate arguments that disclosure of the source code could put a DNA analysis company out of business. Some courts deny defendants an opportunity to review the source code for these programs[3] and others allow review but under protective orders that, in addition to safeguarding the code, prevent any other defendant in a criminal matter from benefiting from the resources expended conducting the review.

Perhaps, as STRmix claims, analysis of the source code actually is a non-optimal method for confirming that the software works. However, in 2015, Australian authorities announced an error in the STRmix code that had required review of sixty criminal cases. In twenty-three of those cases, the review resulted in a change to the likelihood ratio that had been generated.  That incident, along with reported issues with New York’s FST, underscores the need for open and independent review of the source code as a necessary part of validation.

The more interesting question here, though, is whether there is an alternative to highly commercial, proprietary, secretive DNA analysis tools.

New York’s OCME developed its FST and used it for years without ever monetizing it, before ultimately switching to STRmix. The FST was a major taxpayer investment that spent years in hiding, never to emerge as a profit source. What if, instead of keeping the FST code hidden for that time, OCME had created its tool and made it open source? The code could have been independently audited years ago. If no major flaws were found, then scientists and programmers, governments, universities, or independent researchers could have worked to improve the program. Other government labs too could have adopted the independently verified open source tool instead of paying a private company for an unverified tool.

OCME’s FST represented a major expenditure of government resources. It is proof that government entities are able and willing to pay for the research and development of this software. Going forward, the next time a government agency creates its own criminal justice algorithm, it should skip the wasteful secrecy and release the code for the public benefit.

[1] Source code is the human-readable form of a program. For most programming languages, a human programmer will type up a program, creating source code, and then use a compiler to convert that human-readable code into machine code, which can be understood and run by a computer. Compiled code cannot be reliably “decompiled” into what the programmer wrote and thus cannot be readily reviewed for errors. Compiled code can be “disassembled” into assembly language, which is significantly more difficult for humans to work with than higher level languages. Some languages use an interpreter to run code and do not compile it.

[2] The FST source code, however, remains the copyrighted property of its creator, the City of New York. Although open to auditing, it cannot be freely modified or used.

[3] See, e.g., People v. Carter, 36 N.Y.S.3d 48 (N.Y. Sup. Ct. 2016).