About this blog

Case Disclosed is a blog written by students, supervising attorneys, directors, alumni, and friends of the Media Freedom & Information Access Clinic.

The views expressed on this blog belong to the author(s) and do not represent the views of Yale Law School or the Media Freedom and Information Access Clinic (MFIA).

National Freedom of Information Coalition

Could Algorithmic Review Level the FOIA Playing Field?

April 17, 2019

Plaintiffs and defendants in Freedom of Information Act (FOIA) lawsuits do not necessarily litigate on an equal playing field. FOIA in some respects advantages plaintiffs, since records are by default disclosable unless the defendant affirmatively demonstrates the applicability of one or more of FOIA’s enumerated exceptions. When it comes to actually litigating those exemptions, though, plaintiffs suffer from an inherent asymmetry of information that can make it difficult to meaningfully scrutinize the defendant’s claims.

While defendants have complete knowledge about the records plaintiffs seek, plaintiffs often know little to nothing about them. When defendants make claims about the substance or nature of those records to substantiate withholdings, then, plaintiffs are not well situated to evaluated those claims. Discovery is of little aid because FOIA lawsuits are typically resolved at the summary judgment stage without discovery, which is generally considered inappropriate because it would short-circuit the process of determining whether disclosure is required in the first instance. FOIA’s structure thus contains an inherent tension: on the one hand, plaintiffs need to be able to meaningfully test the defendant’s case, but on the other hand, the nondisclosure of contested records must be maintained until a court determines that none of FOIA’s exemptions apply.

Typically, plaintiffs utilize three tools to gain insight into the nature of information or records withheld by the defendant. First, litigants might look at context clues within partially redacted records, or across other disclosed records, to make inferences about the withheld information. Any such inferences, however, remain a matter of subjective interpretation and could lead to erroneous assumptions about the nature of the material. 

Second, courts are empowered by statute to conduct in camera review of withheld information to determine the propriety of the withholdings. But courts are often reluctant to conduct in camera review, both because such review requires the expenditure of the court’s limited resources, and also because courts generally give deference to agency determinations in FOIA cases and are hesitant to reject an agency’s reasoning. 

Third, where defendants provide a Vaughn index describing the contested records, plaintiffs can analyze the index to gain insights about the nature of the withholdings. Defendants’ indices are likely to contain only the bare minimum of detail, though, as any information beyond the “reasonable specificity” required of them could unwittingly provide plaintiffs with valuable information to use against defendants.

These three tools, then, do not entirely resolve the asymmetry that plaintiffs face when litigating FOIA exemptions. If plaintiffs cannot themselves directly view the contested records, the next best means of scrutinizing the defendant’s claims would be for the records to be examined by a neutral third party. In cases where courts opt not to assume that role, what if there were another neutral third party available to review the records? Computer review of FOIA materials — what we might call algorithmic review — could potentially provide an entirely new and effective means by which a defendant’s claims can be thoroughly examined, without implicating the same concerns as direct review or in camera review. 

The suggestion that algorithmic review might be useful in the FOIA context is not entirely new. In National Day Laborer Organizing Network v. U.S. Immigration and Customs Enforcement Agency, et al, Judge Shira Scheindlin noted that agencies conducting searches to comply with FOIA requests might rely on novel techniques including “latent semantic indexing, statistical probability models, and machine learning tools” to find responsive documents. A simple keyword search, she noted, might not be considered adequate in light of recent technological developments. Though Judge Scheindlin’s suggestion only extended to the adequacy of the initial search, the same principles motivate an application of similar technology to third party review of the responsive documents.

The techniques described by Judge Scheindlin are, in fact, already used by lawyers for both e-discovery and contract management. Four features in particular might prove uniquely useful in reviewing withheld material in FOIA cases: conceptual searching, document classification, data visualization, and data classification.  

Conceptual searches, unlike traditional searches, return documents based on similarities in semantic meaning rather than just plain text. In other words, conceptual searches identify documents that contain relevant subject matter, regardless of how that subject matter is reflected in the verbiage of the document. Conceptual searching could theoretically be used to verify that the subject matter of a document is plausibly related to the claimed exemption, or to test the description of the material provided in the defendant’s Vaughn index. 

Document classification, also known as “technology assisted review” or TAR, involves the use of algorithms to classify documents into distinctive groups based on a predefined set of criteria. Especially cutting edge TAR packages are able to classify documents without any training data. In other words, these algorithms can make determinations about certain material without any initial manual calibration to test and verify the results. Document classification features could be used to verify a defendant’s description of withheld material, and also to provide valuable metadata to plaintiffs that may not itself be exempt from disclosure. This classification could be performed prior to any manual in camera review, in order to facilitate and expedite the process.

Software that analyzes and visualizes patterns and relationships among documents could enable users to see the relationships between documents within a cache, as well as the larger patterns that emerge from the cache as a whole. Such visualization tools could prove useful where, for instance, agencies do not provide an itemized list of the withheld material. Agencies withholding certain law enforcement records, for instance, sometimes need only defend their withholdings on a categorical basis, as opposed to document-by-document. An analysis of the patterns and relationships among and between the responsive material could test the legitimacy of the proffered categories.

Finally, some e-discovery tools are able to classify information within documents. For instance, certain e-discovery platforms are able to identify and flag personally identifiable information within discoverable documents in order to ensure that information is properly redacted (where appropriate) before being produced. Data classification tools could prove immensely useful with respect to validating redactions within a document, particularly those withheld on privacy-related grounds. Similarly, sensitive technical or financial information might be automatically flagged for trade secrets exemption claims.

Commentators have already suggested that machine learning will become an important part of FOIA compliance efforts. It also stands to reason that its usage could be further extended to assist in camera review by determining whether the defendant’s exemption claims are plausible. Before analyzing documents manually, courts could use software packages to process the entire body of responsive records, testing the content of the records against the language supplied by defendants in support of their claims. Alternatively, defendants could agree to perform their own algorithmic review using a reliable and independently verified third party software package. Where algorithmic classifications of the withheld material do not comport with the defendant’s descriptions or declarations, a court might then request supplementary briefing or declarations to explain the inconsistencies.

Compared to manual review, algorithmic review might be more readily available, encompass a broader range of material, and be applied more consistently. Courts’ concerns about the expenditure of judicial resources would be mitigated, as would their concern about being thrust into thorny discretionary questions. In short, algorithmic review of contested FOIA material might provide an opportunity for plaintiffs to meaningfully scrutinize claims made by defendants without having to directly view the contested material themselves.

Useful as these features might be, the technological advances needed for effective algorithmic review of FOIA materials are likely far off, as this type of review presents a more difficult challenge than conducting an initial search. Records cannot always be fully discerned in isolation; often they must be evaluated within a broader class of material. Human review might thus be needed when evaluating FOIA material, because determinations about the propriety of a given exemption may require the consideration of information outside the domain of the reviewable material. For example, exemption 7 claims (concerning law enforcement records) may depend on information about pending investigations not reflected in the records directly responsive to the FOIA request. An algorithm examining only the body of contested material might not be able to establish the associations needed to properly test the claim. Moreover, the legitimacy of exemption claims may depend on subjective determinations that algorithms are not well suited to evaluate. FOIA’s exemption 6, for instance, prohibits the disclosure of information that would constitute a “clearly unwarranted invasion of personal privacy.” But whether or not an invasion of privacy is unwarranted often depends on a subjective evaluation of the private interests at stake versus the public interest of disclosure. 

As the technology currently exists, then, effective algorithmic review of FOIA material may not be possible. But machine learning techniques are constantly evolving, and it may not be long until tools exist that are able to scrutinize the application of particular exemptions to entire records or certain information contained within them. When that day arrives, courts ought to strongly consider the use of algorithmic review — subject to the appropriate constraints — in order to level the playing field between FOIA plaintiffs and defendants.