publications | You R. Name

2023

Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark

Fazl Barez*, Julia Persson*, Ioannis Konstas Esben Kran, and 1 more author

2023

Abs arXiv Bib

Recent model editing techniques promise to mitigate the problem of memorizing false or outdated associations during LLM training. However, we show that these techniques can introduce large unwanted side effects which are not detected by existing specificity benchmarks. We extend the existing CounterFact benchmark to include a dynamic component and dub our benchmark CounterFact+. Additionally, we extend the metrics used for measuring specificity by a principled KL divergence-based metric. We use this improved benchmark to evaluate recent model editing techniques and find that they suffer from low specificity. Our findings highlight the need for improved specificity benchmarks that identify and prevent unwanted side effects.
@article{fbarez2023detecting, title = {Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark}, author = {Barez*, Fazl and Persson*, Julia and Esben Kran, Ioannis Konstas and Hoelscher-Obermaier*, Jason}, year = {2023}, publisher = {Courier Corporation}, conference = {Findings of the Association for Computational Linguistics}, }
The Larger they are, the Harder they Fail: Language Models do not Recognize Identifier Swaps in Python

Antonio Valerio Miceli Barone*, Fazl Barez*, Ioannis Konstas, and 1 more author

2023

Abs arXiv Bib

Large Language Models (LLMs) have successfully been applied to code generation tasks, raising the question of how well these models understand programming. Typical programming languages have invariances and equivariances in their semantics that human programmers intuitively understand and exploit, such as the (near) invariance to the renaming of identifiers. We show that LLMs not only fail to properly generate correct Python code when default function names are swapped, but some of them even become more confident in their incorrect predictions as the model size increases, an instance of the recently discovered phenomenon of Inverse Scaling, which runs contrary to the commonly observed trend of increasing prediction quality with increasing model size. Our findings indicate that, despite their astonishing typical-case performance, LLMs still lack a deep, abstract understanding of the content they manipulate, making them unsuitable for tasks that statistically deviate from their training data, and that mere scaling is not enough to achieve such capability.
@article{valerio2023the, title = {The Larger they are, the Harder they Fail: Language Models do not Recognize Identifier Swaps in Python}, author = {Barone*, Antonio Valerio Miceli and Barez*, Fazl and Konstas, Ioannis and Cohen, Shay B}, year = {2023}, publisher = {Courier Corporation}, }
Neuron to Graph: Interpreting Language Model Neurons at Scale

Alex Foote*, Neel Nanda, Fazl Barez*, and 3 more authors

2023

Abs arXiv Bib

Advances in Large Language Models (LLMs) have led to remarkable capabilities, yet their inner mechanisms remain largely unknown. To understand these models, we need to unravel the functions of individual neurons and their contribution to the network. This paper introduces a novel automated approach designed to scale interpretability techniques across a vast array of neurons within LLMs, to make them more interpretable and ultimately safe. Conventional methods require examination of examples with strong neuron activation and manual identification of patterns to decipher the concepts a neuron responds to. We propose Neuron to Graph (N2G), an innovative tool that automatically extracts a neuron’s behaviour from the dataset it was trained on and translates it into an interpretable graph. N2G uses truncation and saliency methods to emphasise only the most pertinent tokens to a neuron while enriching dataset examples with diverse samples to better encompass the full spectrum of neuron behaviour.
@article{fbarez2023NeuronToGraph, title = {Neuron to Graph: Interpreting Language Model Neurons at Scale}, author = {Foote*, Alex and Nanda, Neel and Barez*, Fazl and Konstas, Ioannis and Cohen, Shay and Kran, Esben}, year = {2023}, publisher = {Courier Corporation}, }
Understanding Addition in Transformers

Philip Quirke, and Fazl Barez

2023

Abs arXiv Bib

This paper presents an in-depth analysis of a one-layer Transformer model trained for n-digit integer addition. We reveal that the model divides the task into parallel, digit-specific streams and employs distinct algorithms for different digit positions. Our study also finds that the model starts calculations late but executes them rapidly. A rare use case with high loss is identified and explained.
@misc{quirke2023understanding, title = {Understanding Addition in Transformers}, author = {Quirke, Philip and Barez, Fazl}, year = {2023}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, }
Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders

Luke Marks, Amir Abdullah, Luna Mendez, and 3 more authors

2023

Abs arXiv Bib

Large language models (LLMs) aligned to human preferences via reinforcement learning from human feedback (RLHF) underpin many commercial applications. However, how RLHF impacts LLM internals remains opaque. We propose a novel method to interpret learned reward functions in RLHF-tuned LLMs using sparse autoencoders. Our approach trains autoencoder sets on activations from a base LLM and its RLHF-tuned version.
@misc{marks2023interpreting, title = {Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders}, author = {Marks, Luke and Abdullah, Amir and Mendez, Luna and Arike, Rauno and Torr, Philip and Barez, Fazl}, year = {2023}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, }
AI Systems of Concern

Kayla Matteucci, Shahar Avin, Fazl Barez, and 1 more author

2023

Abs arXiv Bib

Concerns around future dangers from advanced AI often centre on systems hypothesised to have intrinsic characteristics such as agent-like behaviour, strategic awareness, and long-range planning. We label this cluster of characteristics as "Property X". Most present AI systems are low in "Property X"; however, in the absence of deliberate steering, current research directions may rapidly lead to the emergence of highly capable AI systems that are also high in "Property X". We argue that "Property X" characteristics are intrinsically dangerous, and when combined with greater capabilities will result in AI systems for which safety and control is difficult to guarantee.
@misc{matteucci2023ai, title = {AI Systems of Concern}, author = {Matteucci, Kayla and Avin, Shahar and Barez, Fazl and hÉigeartaigh, Seán Ó}, year = {2023}, archiveprefix = {arXiv}, primaryclass = {cs.AI}, }
DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models

Fazl Barez Albert Garde

2023

Abs arXiv Bib

As large language models (LLMs) become more capable, there is an urgent need for interpretable and transparent tools. Current methods are difficult to implement, and accessible tools to analyze model internals are lacking. To bridge this gap, we present DeepDecipher - an API and interface for probing neurons in transformer models’ MLP layers. DeepDecipher makes the outputs of advanced interpretability techniques for LLMs readily available. The easy-to-use interface also makes inspecting these complex models more intuitive.
@misc{garde2023deepdecipher, title = {DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models}, author = {Albert Garde, Esben Kran, Fazl Barez}, year = {2023}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, }
The Alan Turing Institute’s response to the House of Lords Large Language Models Call for Evidence

Fazl Barez, Philip H. S. Torr, Aleksandar Petrov, and 24 more authors

2023
Fairness in AI and Its Long-Term Implications on Society

Ondrej Bohdal*, Timothy Hospedales, Philip H. S. Torr, and 1 more author

2023

Abs arXiv Bib

Successful deployment of artificial intelligence (AI) in various settings has led to numerous positive outcomes for individuals and society. However, AI systems have also been shown to harm parts of the population due to biased predictions. AI fairness focuses on mitigating such biases to ensure AI decision making is not discriminatory towards certain groups. We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time and act as a social stressor.
@misc{bohdal2023fairness, title = {Fairness in AI and Its Long-Term Implications on Society}, author = {Bohdal*, Ondrej and Hospedales, Timothy and Torr, Philip H. S. and Barez*, Fazl}, year = {2023}, archiveprefix = {arXiv}, primaryclass = {cs.CY}, }
Exploring the Advantages of Transformers for High-Frequency Trading

Fazl Barez, Paul Bilokon, Arthur Gervais, and 1 more author

2023

Abs arXiv Bib

This paper explores the novel deep learning Transformers architectures for high-frequency Bitcoin-USDT log-return forecasting and compares them to the traditional Long Short-Term Memory models. A hybrid Transformer model, called \textbfHFformer, is then introduced for time series forecasting which incorporates a Transformer encoder, linear decoder, spiking activations, and quantile loss function, and does not use position encoding.
@misc{barez2023exploring, title = {Exploring the Advantages of Transformers for High-Frequency Trading}, author = {Barez, Fazl and Bilokon, Paul and Gervais, Arthur and Lisitsyn, Nikita}, year = {2023}, archiveprefix = {arXiv}, primaryclass = {q-fin.ST}, }
Benchmarking Specialized Databases for High-frequency Data

Fazl Barez, Paul Bilokon, and Ruijie Xiong

2023

Abs arXiv Bib

This paper presents a benchmarking suite designed for the evaluation and comparison of time series databases for high-frequency data, with a focus on financial applications. The proposed suite comprises of four specialized databases: ClickHouse, InfluxDB, kdb+ and TimescaleDB. The results from the suite demonstrate that kdb+ has the highest performance amongst the tested databases, while also highlighting the strengths and weaknesses of each of the databases.
@misc{barez2023benchmarking, title = {Benchmarking Specialized Databases for High-frequency Data}, author = {Barez, Fazl and Bilokon, Paul and Xiong, Ruijie}, year = {2023}, archiveprefix = {arXiv}, primaryclass = {cs.DB}, }
articulated_mm

Identifying a Preliminary Circuit for Predicting Gendered Pronouns in GPT-2 Small

Chris Mathwin, Guillaume Corlouer, Esben Kran, and 2 more authors

2023

Abs

We identify the broad structure of a circuit that is associated with correctly predicting a gendered pronoun given the subject of a rhetorical question. Progress towards identifying this circuit is achieved through a variety of existing tools, namely Conmy’s Automatic Circuit Discovery and Nanda’s Exploratory Analysis tools.

2022

PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration

Pengyi Li, Hongyao Tang, Tianpei Yang, and 7 more authors

2022

Abs arXiv Bib

While transformer models exhibit strong capabilities on linguistic tasks, their complex architectures make them difficult to interpret. Recent work has aimed to reverse engineer transformer models into human-readable representations called circuits that implement algorithmic functions. We extend this research by analyzing and comparing circuits for similar sequence continuation tasks, which include increasing sequences of digits, number words, and months.
@article{fbarez2022pmic, title = {PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration}, author = {Li, Pengyi and Tang, Hongyao and Yang, Tianpei and Barez, Fazl and Hao, Xiaotian and Sang, Tong and Zheng, Yan and Hao, Jianye and Taylor, Matthew E and Wang, Zhen}, year = {2022}, publisher = {Courier Corporation}, }
System III: Learning with Domain Knowledge for Safety Constraints

Fazl Barez, Hosien Hasanbieg, and Alesandro Abbate

2022

Abs arXiv Bib

Reinforcement learning agents naturally learn from extensive exploration. Exploration is costly and can be unsafe in \textitsafety-critical domains. This paper proposes a novel framework for incorporating domain knowledge to help guide safe exploration and boost sample efficiency. Previous approaches impose constraints, such as regularisation parameters in neural networks, that rely on large sample sets and often are not suitable for safety-critical domains where agents should almost always avoid unsafe actions
@misc{barez2023iii, title = {System III: Learning with Domain Knowledge for Safety Constraints}, author = {Barez, Fazl and Hasanbieg, Hosien and Abbate, Alesandro}, year = {2022}, archiveprefix = {arXiv}, primaryclass = {cs.RL}, }

2021

ED2: An Environment Dynamics Decomposition Framework for World Model Construction

Cong Wang, Tianpei Yang, Fazl Barez, and 7 more authors

2021

Abs arXiv Bib

Model-based reinforcement learning methods achieve significant sample efficiency in many tasks, but their performance is often limited by the existence of the model error. To reduce the model error, previous works use a single well-designed network to fit the entire environment dynamics, which treats the environment dynamics as a black box. However, these methods lack to consider the environmental decomposed property that the dynamics may contain multiple sub-dynamics, which can be modeled separately, allowing us to construct the world model more accurately.
@misc{wang2021ed2, title = {ED2: An Environment Dynamics Decomposition Framework for World Model Construction}, author = {Wang, Cong and Yang, Tianpei and Barez, Fazl and Zheng, Yan and Tang, Hongyao and Hao, Jianye and Liu, Jinyi and Peng, Jiajie and Piao, Haiyin and Sun, Zhixiao}, year = {2021}, archiveprefix = {arXiv}, primaryclass = {cs.LG}, }
Discovering topics and trends in the UK Government web archive

David Beavan, Fazl Barez, M Bel, and 4 more authors

2021

Abs Bib

The challenge we address in this report is to make steps towards improving search and discovery of resources within this vast archive for future archive users, and how the UKGWA collection could begin to be unlocked for research and experimentation by approaching it as data (i.e. as a dataset at scale). The UKGWA has begun to examine independently the usefulness of modelling the hyperlinked structure of its collection for advanced corpus exploration; the aim of this collaboration is to test algorithms capable of searching for documents via the topics that they cover, envisioning a future convergence of these two research frameworks. This is a diachronic corpus that is ideal for studying the emergence of topics and how they feature through government websites over time, and it will indicate engagement priorities and how these change over time.
@misc{wang2021ed3, title = {Discovering topics and trends in the UK Government web archive}, author = {Beavan, David and Barez, Fazl and Bel, M and Fitzgerald, John and Goudarouli, Eirini and Kollnig, Konrad and McGillivray, Barbara}, year = {2021}, }