02 May 2023

AI, Patent Reading and Patent Disclosure

'Misleading AI: Regulatory Strategies for Transparency in Information Intermediary Tools for Consumer Decision-Making' by Jeannie Marie Paterson in (2023) Loyola Consumer Law Review comments

Increasingly, consumers’ decisions about what to buy are mediated through digital tools promoted as using “AI”, “data” or “algorithms” to assist consumers in making decisions. These kinds of digital information intermediaries include such diverse technologies as recommender systems, comparison sites, virtual voice assistants, and chatbots. They are promoted as effective and efficient ways of assisting consumers making decisions in the face of otherwise insurmountable volumes of information. But such tools also hold the potential to mislead consumers, amongst other possible harms, including about their capacity, efficacy, and identity. Most consumer protection regimes contain broad and flexible prohibitions on misleading conduct that are, in principle, fit to tackle the harms of misleading AI in consumer tools. This article argues that, in practice, the challenge may lie in establishing that a contravention has occurred at all. The key characteristics that define AI informed consumer decision-making support tools ––opacity, adaptivity, scale, and personalization –– may make contraventions of the law hard to detect. The paper considers whether insights from proposed frameworks for ethical or responsible AI, which emphasise the value of transparency and explanations in data driven models, may be useful in supplementing consumer protection law in responding to concerns of misleading AI, as well as the role of regulators in making transparency initiatives effective.

'Linguistic metrics for patent disclosure: Evidence from university versus corporate patents' by Nancy Kong, Uwe Dulleck, Adam B Jaffe, Shupeng Sun and Sowmya Vajjala in (2023) 52(2) Research Policy comments 

 Encouraging disclosure is important for the patent system, yet the technical information in patent applications is often inadequate. We use algorithms from computational linguistics to quantify the effectiveness of disclosure in patent applications. Relying on the expectation that universities have more ability and incentive to disclose their inventions than corporations, we analyze 64 linguistic measures of patent applications, and show that university patents are more readable by 0.4 SD of a synthetic measure of readability. Results are robust to controlling for non-disclosure-related invention heterogeneity. The linguistic metrics are evaluated by a panel of “expert” student engineers and further examined by USPTO 112(a) – lack of disclosure – rejection. The ability to quantify disclosure opens new research paths and potentially facilitates improvement of disclosure. ... 

The patent system serves two purposes: “encouraging new inventions” and “adding knowledge to the public domain”. The former incentivizes creation, development, and commercialization by protecting inventors’ exclusive ownership for a limited period of time. The latter encourages disclosure of new technologies by requiring “full, clear, concise, and exact terms” in describing inventions.2 Sufficient disclosure in patents has three major benefits: (1) fostering later inventions (Jaffe and Trajtenberg, 2002, Scotchmer and Green, 1990, Denicolò and Franzoni, 2003); (2) reducing resources wasted on duplicate inventions (Hegde et al., 2022); and (3) inducing more informed investment in innovation (Roin, 2005). 

Despite a large body of literature on the patent incentivizing function (Cornelli and Schankerman, 1999, Kitch, 1977, Tauman and Weng, 2012, Cohen et al., 2002), patent disclosure receives limited attention. This raises concerns; as Roin (2005), Devlin (2009), Sampat (2018), Arinas (2012) and Ouellette (2011) document, the technical information contained in patent documents is often inadequate and unclear. Important questions, such as how to measure disclosure, potential incentives behind disclosure, heterogeneous levels of disclosure by entities, and the tactic of avoiding the disclosure requirement, have not been directly investigated. A major barrier to such empirical research has been the lack of broadly applicable, reproducible quantitative measures of the extent of disclosure or information accessibility. We propose and demonstrate that extant metrics developed in computational linguistics can help to fill this gap. 

In using computational linguistic metrics to compare the readability of documents, we follow researchers in the finance and accounting literature, who have used readability metrics to gauge whether readers are able to extract information efficiently from financial reports (Li, 2008, Miller, 2010, You and Zhang, 2009, Lawrence, 2013). This literature posits that more complex texts increase the information processing cost for investors (Grossman and Stiglitz, 1980, Bloomfield, 2002) and finds, for example, that companies are likely to hide negative performance in complicated text to obfuscate that information (You and Zhang, 2009). 

Although patent applications differ from corporate annual reports, the research question regarding strategic obfuscation is similar: Documents are created subject to regulation, in which the purpose of the regulation is to compel disclosure, but the party completing the document may have incentives to obscure information. Our proposed linguistic measures are likely to serve as an informative proxy for the explicitly or implicitly chosen level of disclosure. The goal of this article is simply to demonstrate that these measures do appear to capture meaningful differences in accessibility or disclosure, and thereby opening up the possibility of research on the causes and effects of variations in disclosure. 

Our strategy for demonstrating the relevance of linguistic readability metrics is to identify a situation in which we have a strong a priori expectation of a systematic difference in disclosure across two groups of patents. If the proposed metrics show the expected difference, we see this as an indication to treat them as potentially useful. We compare patent applications from universities with those of corporations. Both strategic reasons and the costs of revealing information inform our expectations. From a strategic perspective, universities, with their focus on licensing of patents have an interest in making their patents more accessible. In contrast, corporations (particularly practicing corporations) may benefit from limiting the accessibility of information. From a cost perspective, drafting patents is usually informed by documentation of the relevant research or process of innovation. Given university researchers’ primary interest in accessible publications and the relevant standards of documentation, the source material available to an attorney drafting a patent may be much better than in the case of the same attorney drafting a patent for a corporation, in which the need for such documentation is much less. The literature also supports this expectation (Trajtenberg et al., 1997, Henderson et al., 1998, Cockburn et al., 2002). 

Universities and corporations follow different business models for patenting: technology transfer versus in-house commercialization. Patents applied for by universities, with a focus on generating income from the licensing of inventions, should have a higher level of disclosure because transparent information makes it easier to signal the technology contained in the patent and attract potential investors. As a result, they are more readable than corporate patents. The readability difference could be further magnified by the moral requirements of university research as well as the rigor of academic writing, which could further affect the level of disclosure. 

Corporations, particularly those with a focus on in-house production, on the other hand, have a greater incentive to obfuscate crucial technical information to deter competitors from understanding, using, and building on their inventions. The profit-maximizing motive, as well as a lack of incentive to thoroughly document the invention, could also contribute to the low level of disclosure. Together, it is reasonable to assume that universities may strategically (or unconsciously) choose a higher disclosure level in patent applications than corporations. We emphasize that we do not see this analysis as testing the hypothesis that universities engage in more disclosure than corporations for a particular reason. Rather, we take this as a maintained hypothesis and show – conditional on that maintained hypothesis – that the linguistic measures meaningfully capture differences in disclosure across patents, which indicates the value of further research and the need to reconsider patent examination with respect to the accessibility and disclosure of information contained in patents. 

Similar to the finance literature, we use a computational linguistic program designed to assess the reading difficulty of texts using 64 measures from second language acquisition research. The indicators cover the lexical, syntactic, and discourse aspects of language along with traditional readability formulae. We apply them to a full set of U.S. patent application texts in three cutting-edge industries from the past 20 years. Our baseline OLS estimations reveal significant differences between university and corporate patents. Using principal component analysis (PCA) to combine the 64 indicators and create synthetic readability measures, we show that composite indices detect strong differences between university and corporate patents, which lends support to the validity of our measures. 

The key empirical challenge is that the nature of corporate and university inventions might differ; thus, the textual communication required for corporate inventions could differ. To address this concern, our identification strategy employs the following. First, to account for the unobserved heterogeneity in linguistic characteristics intrinsic to technical fields, our econometric method controls for U.S. patent subclass fixed effects. This enables us to measure disclosure as the degree of readability relative to other technologically similar patents. Second, we use patent attorney fixed effects to control for systematic disclosure effects from the drafting agents. This compares the university and corporate patents drafted by the same patent attorney. Third, we employ cited-patent fixed effects with a data compression technique, least absolute shrinkage and selection operator (LASSO), to further control for the nature of inventions. This is because university and corporate patents that cite the same previous patents build on the same prior knowledge, and are therefore likely to be technologically similar inventions. Fourth, to deal with any selection bias from observables, we use a doubly robust estimation that combines propensity score matching and regression adjustment. This enables us to compare university and corporate patents with similar attributes. 

Our results show that corporate patents are 0.4 SD more difficult to read and require 1.1–1.6 years more education to comprehend than university patents. We find that the difference is more prominent for more experienced patent applicants, and that licensing corporate patents disclose more than other corporate patents, which we believe supports the idea that the differences in readability are at least somewhat intentional. We also show that a potential channel for obfuscation lies in the provision of many examples in order to conceal the “best mode” of inventions. 

This paper is one of the first to specifically use textual analysis to examine patent disclosure (with exception of Dyer et al. (2020) who focus on patent examiners’ leniency) and validate the measure. We obtain the whole set of full text patent applications in categories related to nanotechnology, batteries, and electricity from 2000 to 2019, totaling 40,949, and apply our linguistic analysis model to the technical descriptions of these patents. We expand readability studies in related literature that rely heavily on traditional readability indices such as Gunning Fog, Kincaid, and Flesch Reading Ease by including lexical richness, syntactic complexity, and discourse features. We use the best non-commercial readability software (Vajjala and Meurers, 2014b) to capture the multidimensional linguistic features of 64 indicators, and perform a more in-depth linguistic analysis (Loughran and McDonald, 2016) than previous studies. We also use principal components analysis to construct synthetic overall measures of readability. 

Having developed this rich set of readability measures, we validate them as indicators of effective patent disclosure by testing whether the lexical measures show patents to be more readable in several real-world contexts. Our primary comparison is between university and corporate patents. The licensing aims of universities and absence of market driven competitive motives mean that they have greater incentive to disclose – less incentive to conceal – key information relative to corporations. Through analyses that control for sources of variation in readability, we find that university patents are, indeed, more readable. We support this main analysis with several other comparisons. Intellectual Ventures – a corporation that, akin to universities, seeks to license its patents over competing in the market – also holds patents with above average readability. Several large corporations known to be active patent licensors (IBM, Qualcomm, and HP) similarly exhibit higher readability. Additionally, a set of patents that can be presumed to have been reassigned also exhibit higher readability than otherwise similar patents. Finally, we compared the computational readability measures to subjective evaluations of readability and disclosure for a small number of patents, and assessed the readability of patents rejected by the USPTO for reasons that include failure to adequately disclose the technology. 

We see the role of this paper as analogous to Trajtenberg et al. (1997), who first introduced metrics of patent “importance”, “generality” and “originality” based on patent citation data. We imitate their strategy to test whether our proposed new measures reveal the contrast we expect between university and corporate patents, and argue that the finding – that they display the predicted pattern – can be taken as initial evidence that they capture meaningful variation in unobservable patent disclosure quality. The introduction and initial validation of these measures open up the possibility of quantitative treatment of extent of disclosure in patents, both for social science research on the sources and effects of better or worse disclosure, and potentially for use in more systematic treatment of the disclosure obligation in the patent examination process. 

The rest of the paper proceeds as follows. Section 3 explains the linguistic measures used in the study. In Section 2, we review the relevant literature and lay out our hypothesis of differences in disclosure between university and corporate patent applications. Section 4 presents our data and baseline estimation, followed by our main results in Section 5. We examine attorney fixed effects and cited-patent fixed effects in Section 6, and one channel that corporations could use to obscure patent applications in Section 7. We show heterogeneous effects in Section 8 and usefulness tests in Section 9, and conclude in Section 10.