Aston University’s Institute for Forensic Linguistics (AIFL) is part of the AUTHOR research consortium which has won an $11.3 million contract to infer authorship of uncredited documents based on the writing style.
The acronym stands for ‘Attribution, and Undermining the Attribution, of Text while providing Human-Oriented Rationales’. Worth $1.3 million, the Aston University part of the project is being led by Professor Tim Grant and Dr Krzysztof Kredens, who both are recognised internationally as experts in authorship analysis and who both engage in forensic linguistic casework as expert witnesses.
In addition to their recognised general expertise and experience in this area, Professor Grant has specific expertise in using linguistic analysis to enhance online undercover policing and Dr Kredens has led projects to develop authorship identification techniques involving very large numbers of potential authors.
The AUTHOR team is led by Charles River Analytics and is one of six teams of researchers that won The Human Interpretable Attribution of Text Using Underlying Structure (HIATUS) programme sponsored by the Intelligence Advanced Research Projects Activity (IARPA). The programme uses natural language processing techniques and machine learning to create stylistic fingerprints that capture the writing style of specific authors.
On the flip side is authorship privacy – mechanisms that can anonymize identities of authors, especially when their lives are in danger. Pitting the attribution and privacy teams against each other will hopefully motivate each, says Dr Terry Patten, principal scientist at Charles River Analytics and principal investigator of the AUTHOR consortium.
“One of the big challenges for the programme and for authorship attribution in general is that the document you’re looking at may not be in the same genre or on the same topic as the sample documents you have for a particular author,” Patten says. The same applies to languages: We might have example articles for an author in English but need to match the style even if the document at hand is in French. Authorship privacy too has its challenges: users must obfuscate the style without changing the meaning, which can be difficult to execute.”
In the area of authorship attribution, the research and casework experience from Aston University will assist the team in identifying and using a broad spectrum of authorship markers. Authorship attribution research has more typically looked for words and their frequencies as identifying characteristics. However, Professor Grant’s previous work on online undercover policing has shown that higher-level discourse features – how authors structure their interactions – can be important ‘tells’ in authorship analysis.
The growth of natural language processing (NLP) and one of its underlying techniques, machine learning, is motivating researchers to harness these new technologies in solving the classic problem of authorship attribution. The challenge, Patten says, is that while machine learning is very effective at authorship attribution, “deep learning systems that use neural networks can’t explain why they arrived at the answers they did.”
Evidence in criminal trials can’t afford to hinge on such black-box systems. It’s why the core condition of AUTHOR is that it be “human-interpretable.” Dr Kredens has developed research and insights where explanations can be drawn out of black box authorship attribution systems, so that the findings of such systems can be integrated into linguistic theory as to who we are as linguistic individuals.
Initially, the project is expected to focus on feature discovery: beyond words, what features can we discover to increase the accuracy of authorship attribution?
The project has a range of promising applications – identifying counterintelligence risks, combating misinformation online, fighting human trafficking, and even figuring out the authorship of ancient religious texts.
Professor Grant said: “We were really excited to be part of this project both as an opportunity to develop new findings and techniques in one of our core research areas, and also because it provides further recognition of AIFL’s international reputation in the field. Dr Kredens added: “This is a great opportunity to take our cutting-edge research in this area to a new level”.
Professor Simon Green, Pro-Vice-Chancellor for Research, commented: “I am delighted that the international consortium bid involving AIFL has been successful. As one of Aston University’s four research institutes, AIFL is a genuine world-leader in its field, and this award demonstrates its reputation globally. This project is a prime example of our capacities and expertise in the area of technology, and we are proud to be a partner.”
Patten is excited about the promise of AUTHOR as it is poised to make fundamental contributions to the field of NLP. “It’s really forcing us to address an issue that’s been central to natural language processing,” Patten says. “In NLP and artificial intelligence in general, we need to find a way to build hybrid systems that can incorporate both deep learning and human-interpretable representations. The field needs to find ways to make neural networks and linguistic representations work together.”
“We need to get the best of both worlds,” Patten says.
The team includes some of the world’s foremost researchers in authorship analysis, computational linguistics, and machine learning from Illinois Institute of Technology, Aston Institute for Forensic Linguistics, Rensselaer Polytechnic Institute, and Howard Brain Sciences Foundation.