AI Language Models: Boosting or Threatening Marine Equity?

Abstract

AI Large Language Models (LLMs), like GPT, are starting to reshape some aspects of international environmental policymaking; potentially assisting with certain tedious, resource-intensive work like analyzing and drafting policy instruments, building capacity, and aiding public consultation processes. We are cautiously hopeful that LLMs could be used to promote a marginally more balanced footing among decision makers—particularly benefiting developing countries who face capacity constraints that put them at a disadvantage in negotiations. To explore their realistic potentials, limitations, and risks, we present a case study of an AI chatbot for the recently adopted Biodiversity Beyond National Jurisdiction Agreement and critique its answers to key policy questions. While our case study suggests some promising opportunities, it also raises concerns that LLMs could deepen existing inequities. For instance, they may introduce biases by generating text that overrepresents the perspectives of mainly Western economic centers of power, while neglecting developing countries’ viewpoints.

Introduction

Recent breakthroughs in generative artificial intelligence—namely Large Language Models (LLMs) like ChatGPT^1,2—allow computer systems an unprecedented (albeit limited) ability to “understand” documents in context, and engage in natural-feeling conversations about complex topics. These recent LLM breakthroughs have caught researchers by surprise, seemingly exhibiting AI abilities that most thought were still decades away from fruition. LLMs are also the subject of massive hype and confusion. For example, pundits speculate that AI will dramatically reshape many industries in a “fourth industrial revolution”^3,4,5; AI has also become one of the most-discussed topics in corporate earnings calls^6,7; and tech startups are awash with venture capital investments as they race to develop experimental LLM-based software for nearly every industry⁸.

LLMs are already having an impact on marine policymaking processes, despite their risks being poorly understood. A number of this paper’s authors have already observed State representatives and delegates using ChatGPT at the UN for purposes including the drafting of interventions, statements, submissions, and biographies; asking it questions to conduct background research; and even generating whole presentations. Some countries have already developed policies for ChatGPT use for their governmental officials⁹.

Specialized, data-connected AI tools are already finding their way into policy arenas: for example, one can already find commercial and experimental LLM tools for tasks like generating economic impact analysis reports¹⁰, summarizing and answering questions about legislation^11,12, and finding information in large document databases for UN negotiation processes¹³. Efforts in other sectors are slightly more mature; for example, using LLMs to make sense of vast document databases in medicine^14,15, law¹⁶, and academic research^17,18.

Similar tools are likely to find their way into ocean policy processes in the near future. AI tools are poised to help policymakers with a variety of tedious tasks, like understanding complex legal documents, drafting often-repetitive and formally-worded statements and policy instruments, or quickly finding answers to specific research questions. LLM tools could be especially useful for policymakers in developing countries, who tend to be overstretched and under-resourced compared to their peers in the developed world.

However despite their massive hype and rapid uptake, LLM tools’ risks, potential applications, and inner workings are still poorly understood. Researchers are racing to understand how these tools work and why they are so good at what they do, whilst attempting to characterize their emergent behaviors. For example, it is still hotly contested how much LLMs are able to “reason” about new problems, versus how much they only parrot sentences and patterns from their vast training data^{19,20,21,22,23}.

Concerningly, a substantial and growing body of research is documenting inherent biases in popular AI language models. These biases tend to emanate from their training data and design processes, and embody harmful racial and gender stereotypes^19,24,25. Activists and researchers have documented real-world consequences of AI biases, such as discrimination in hiring systems for tracking and filtering job applications^26,27,28; discrimination in algorithmically-targeted advertisements for work and housing^29,30,31; and the use of AI for criminal sentencing, supposedly predicting a defendants’ risk of re-offending^32,33,34.

This article explores the potential equity implications for LLM-based generative AI when it comes to marine policymaking processes. While we worry about the same racial and gender biases that are increasingly documented by critical technology scholars, the distinct and interrelated characteristics of the marine environment present their own set of equity concerns:

1.

Marine governance is trans-boundary in nature³⁵.
2.

The marine environment is a highly productive economic space of global consequence with complex dynamics of social inequality at play^36,37.
3.

Capacity and power imbalances among oceanic actors remain profound^38,39.
4.

Nevertheless, marine policymakers and researchers have often prioritized ecology over human factors, neglecting cultural, racial, class, and gender distinctions.⁴⁰.

When coupled with the inherent biases and technical limitations of AI language models, these characteristics of the marine environment create particular risks and considerations for using LLMs in marine policy applications.

To evaluate these characteristics and dynamics, this article develops an exploratory case study to examine risks and considerations. We created a “BBNJ Question-Answering Bot” (Fig. 1) for the recently adopted Agreement under the United Nations Convention on the Law of the Sea on the conservation and sustainable use of marine biological diversity of areas beyond national jurisdiction (the BBNJ Agreement)⁴¹. The BBNJ Agreement makes a salient equity case study for AI tools because (1) it underwent almost two decades of controversial negotiations between blocs of “developed” and “developing” States, with patterns of inequity and neocolonialism observed during the process^38,42,43; and (2) national governments must now determine how to effectively implement the BBNJ Agreement’s obligations within their domestic legal systems—a task that is likely to be exceedingly burdensome for many resource-constrained governments, but something that LLM tools could potentially assist with, given their ability to interpret large document sets. We describe our experimental bot in Section “Methods”, and situate it as a representative model for likely future AI policy tools.

Fig. 1: User interface screenshot for the BBNJ Question-Answering Bot.

1. Users type a question related to the BBNJ agreement. 2. The bot forms its answer after searching for relevant information in a database of BBNJ-related documents. Here, the user can optionally include or exclude some of the sources used for the bot’s answer. 3. A “temperature” parameter sent to ChatGPT, that influences the “randomness” of the answers it generates. 4. The bot generates an answer to the user’s question and displays it here, after clicking “submit.” 5. Here, the user can also browse though the source texts which the bot used to generate its answer. (These are found by searching through the BBNJ documents chosen in step 2, to find the passages most relevant to the user’s question).

Full size image

We are especially concerned about how AI applications can be biased towards the perspectives of powerful affluent countries, and how their use at the UN could further disadvantage developing countries in negotiation processes. We demonstrate and characterize these biases by analyzing responses from our BBNJ Question-Answering Bot (Section “Biases and equity concerns”). We outline several ways biases can enter the chatbot system, including (1) biases in training data for the underlying foundational language models (like GPT); (2) problems arising from the AI chatbot’s connection to UN negotiation documents, including disproportionate over-representation of affluent viewpoints, as well as the models’ difficulty interpreting the subtle nuances of polite diplomatic language; and (3) biases arising from the design of the chatbot program (aka. “prompt engineering”). We also outline social and institutional factors that could allow these AI errors and biases to perpetuate into policymaking processes, like over-trust and overreliance on AI, potentially de-skilling the workforce and displacing real capacity building. (The Supplementary Material offers key technical background information to help interpret and contextualize these findings.)

While urgent attention should be directed towards problems and biases of LLMs, we also remain cautiously hopeful that they could play a role in promoting a more balanced footing among delegations in ocean policy negotiations (Section “Opportunities for marginal equity improvements”). Especially considering how many resource-constrained governments currently lack access to the necessary legal and technical expertise and the financial, human and technological resources to participate in these processes, LLM-based AI tools could provide useful capacity-building assistance when it comes to understanding legal and policy instruments and aid public consultation processes, providing easier access to information.

We certainly do not expect LLMs to radically upend the power dynamics of the UN, but LLMs are still likely to change how policymakers do their work, similarly to how the adoption of word processing software or the Internet changed policymaking processes. There is still potential for LLMs to empower developing countries in these international forums, if specifically pursued with enough intention and resources. Furthermore, our work underscores the need for developing countries to build their own technical capacity, to engage with AI on their own terms instead of relying on centralized power from the technology industry and developed countries⁴⁴.

Results

In this analysis we explore the opportunities and risks that LLMs present for equity in marine policymaking. We underpin our conclusions by critically examining the behavior of our BBNJ Question-Answering Bot, and by integrating insights from computer science research with the extensive sociological literature on inequities in international environmental governance. From linking these together, we describe several technical mechanisms that can introduce bias into AI applications, as well as some other specific mechanisms by which AI may reinforce inequities. (Please refer to the Supplementary Material for technical background information and additional example chats.)

Biases and equity concerns

This section elaborates our concerns that LLMs could further disadvantage developing States and other marginalized actors in marine policymaking. We explore potential biases in AI models and their applications; and also the social and institutional factors which could perpetuate them, like misplaced trust or over-reliance on AI.

By considering example chats from our BBNJ Question-Answering Bot, here we demonstrate its possible biases and tendencies, wherein we were specifically trying to assess how it might favor the perspectives of the central economic powers. These mainly appeared through the chatbot as Western advanced-capitalist nation-states. It is reasonable to expect that these biases would also be present in other AI tools for international policy in general, due to the fact that the bot designed for this research follows one of the most common design patterns for LLM applications today.

Biases in underlying language models

Modern chatbot applications are built on top of AI language models to give them a contextual “understanding of the world.” Consequently, biases lie within these models themselves. The creators of the most popular LLMs (GPT, Bard, and Llama) do not reveal what data the models are trained on, (only that it has vast amounts of text from the Internet, as well as published articles and books). But, as scholars have tried to deduce, it is likely that model training data skews to over-represent the viewpoints of developed countries, as they have produced the most online content. Racial, gender, and nationality biases are well-documented in these models^24,45,46. The secrecy surrounding LLMs’ training data and inner workings makes them difficult to research though, and their biases may be worse than researchers realize at present.

In respect to ocean-related issues, producing and accessing relevant information and data tends to be expensive and limited to academic and scientific institutions in developed countries, thus the training is likely to over-represent the perspectives of developed States. For example, in Fig. 2 our BBNJ Question-Answering Bot is more amenable to speculate about the BBNJ’s impact on human rights in Thailand compared to the USA; likely because the underlying model (GPT) has stronger negative associations about Thailand and human rights. Furthermore, since the models learn the strongest representations of speech patterns from the authors used in their training data, we speculate that this bias in language patterns could negatively impact the models’ abilities to interact with policy documents written in different voices from developing countries, particularly in small island developing States, though this requires further research.

Fig. 2: Example: Biases from Underlying Foundation Model.

This series of examples illustrates how biases in the underlying foundation model (GPT) can carry through to our specific-purpose BBNJ Question-Answering Bot. a Biases in generic ChatGPT, related to international politics. These answers from generic ChatGPT demonstrate a bias in the GPT foundation model itself, learned from its training data. When asked the same question about human rights violations by the USA versus Thailand, ChatGPT consistently gives much softer treatment to the USA, calling the question of human rights violations “complex and subjective”. In contrast, the bot answers much more decisively and negatively about human rights violations in Thailand, listing several specific issues, (even as the USA faces many human rights issues which overlap with these). In this case, we infer that GPT has learned this bias from the publicly-available text which makes up its training data; mirroring the inequitable ways which human rights for Thailand and the USA tend to be discussed in the public sphere. (This example is adapted from one by Ilan Manor⁹.) b Similarly biased answers from the BBNJ Question-Answering Bot. Asked about how the BBNJ Agreement could impact human rights abuses, our BBNJ Question-Answering Bot has much more to say about Thailand than the USA. This behavior mirrors the biases shown for generic ChatGPT in part (a): the bot’s answer for the USA is much more vague (and not specific to the USA), but for Thailand it goes into much more specific detail about potential human rights abuses. From these similarities we infer that this behavior is likely influenced by the biases in GPT itself, since GPT appears to associate Thailand more strongly with the language of human rights abuses.

Full size image

Biases in an applications’ document database

Our BBNJ chatbot interacts with a database of legal policy documents (similar to many other AI tools), and this inevitably introduces biases into the system through several different mechanisms⁴⁷.

Oftentimes, developed countries’ perspectives are over-represented in UN documents because of entrenched institutional power, an issue that continues to be documented by critical sociologists and others who have conducted ethnographies inside such institutions^48,49. In fact, this general phenomena of international institutions wielding tremendous power and influence over developing countries has been documented at numerous points and persists at the level of global NGOs and their “field offices” that seek to, in sociologist Michael Goldman’s words “discipline” in the sense of creating ever new fields of knowledge (e.g., interdisciplinary social sciences), while disciplining subjects into being better market actors⁵⁰. Additionally, developed countries have much greater capacity to simply produce statements, peer-reviewed papers, and media articles supporting their positions; and critical political ecologists and development scholars have contended that this capacity imbalance has been leveraged to extend neocolonial power structures^51,52,53. Developing countries generally have less capacity to produce these documents, and their scholars will take much longer to write about the BBNJ’s shortcomings from their perspective. It is also possible that they may be so structurally constrained in their activities, that they do not simply have the time to spend producing documentation to, in effect, compete with the developed countries⁵⁴. In fact, many developing countries’ positions and issues with the BBNJ Agreement are not well-documented, or simply not translated into English, resulting in a very limited batch of information in the public sphere.

Furthermore, many of the developing countries’ issues and complaints go undocumented because of certain organizational cultural codes of distinction, politeness, and decorum at the UN, or what sociologist Fiona McConnell has described as “repertoires of diplomatic behavior”^55,56. For example, complaints must be raised discreetly so as not to offend other parties; (e.g. parties will say they “had very rich discussions” or “robust exchanges” instead of writing: “we disagree”). For instance, the official documents reviewed as part of this research rather triumphantly celebrate the BBNJ negotiations’ successes, but tiptoe around its disappointments. In our example, GPT apparently struggles to interpret the subtleties and veiled meanings hidden behind the polite tone.

In Fig. 3, the BBNJ Question-Answering Bot demonstrates a tendency to speak very positively and non-critically about the BBNJ’s achievements towards fair access and benefit-sharing, even when specifically asked about the perspectives of developing countries. (However, although representatives from developing States won some hard-fought victories^57,58, many would dispute that fair access and benefit-sharing was actually achieved in the final text of Agreement, rather instead they just reached an acceptable compromise.) In this case, the bot is most likely answering this way because it is parroting the negotiation documents’ ceremonial niceties that celebrate the Agreement’s successes: for example, section 2 of the Agreement is titled “Marine Genetic Resources, Including The Fair And Equitable Sharing Of Benefits”⁴¹. (Whether it actually accomplishes this is a matter of dispute.) In contrast many of the criticisms of this mechanism, particularly from developing States’ representatives, are not publicly documented and thus not available for the bot to use in its answer.

Fig. 3: Biases stemming from official BBNJ negotiation documents.

These examples illustrate biases that stem from the bot’s connection to official documents from the BBNJ negotiations. The chatbot tends to speak only very positively about the Agreement’s achievements, and tends not to criticize it. Most likely, the bot answers in this way because it is parroting back the tone and substance of the official documents in our database which tend to celebrate the Agreements’ achievements, while the grievances tend to be masked in subtle diplomatic language. In this, the bot misses much of the context of the negotiation processes. Many of the developing countries’ grievances are not documented anywhere, largely because of the UN’s polite etiquette, so an LLM bot cannot “know” this information. a Question subject to interpretation. In this response, the bot’s answer lauds the successes of the BBNJ towards fair access and benefit sharing; though many negotiators from developing countries would disagree with this assessment, instead contending that the final agreement did not do enough. (b)“…from the perspective of developing countries…” The behavior persists in this altered version of the question, which specifically asks for developing countries’ perspectives. It gives some more details on developing countries’ demands but still only depicts them in a positive light instead of describing the shortcomings. The bot does, however, hint at the controversy with a brief mention to “divergent opinions” but does not go into any more depth. c Increasingly specific language. This version shows how the bot can be coaxed more towards the viewpoints of less-represented parties, though, by using increasingly specific language. In this case, by specifically naming Small Island Developing States and asking more detailed questions, we are influencing which sources the bot consults for its answer.

Full size image

Much of the important behind-the-scenes disagreements never find their way into official documents, and this disproportionately affects the voice of already marginalized country actors. AI tools relying on these documents could therefore end up further sweeping the key structural issues at play “under the rug”. Furthermore, the complete lack of documentation on many of developing countries’ perspectives mean that it would be difficult to find a technical solution to this problem, via AI “debiasing” techniques like oversampling²⁵.

Biases in prompting and application design

Biases can also be introduced by the technical design of a chatbot or other LLM application; and the AI models’ behavior is influenced by subtle distinctions in how the application programmatically interacts with it. A model can generate different answers depending on the cultural or regional vernacular that is used to ask it a question, or the tone of voice; and many users do not realize that different wordings of a semantically-equivalent prompt can yield dramatically different results⁵⁹. For example, our BBNJ Question-Answering Bot gave completely opposite answers to a question worded in different tones (Fig. 4). Since LLM applications commonly generate the prompt from a template (Supplementary Figs. 4 and 5), a software designer or user could carefully exploit this behavior to “steer” the model in different ways. (I.e. starting a prompt with “I’m a conservative” or “I’m a liberal” in a prompt will lead to different answers to a policy question.) However, this behavior could also lead to unintended outcomes, or different model behaviors for users from different cultural backgrounds.

Fig. 4: Biases stemming from prompt wording.

In these examples, different wordings of substantially the same question produce opposing answers from the BBNJ Bot. When asked about the BBNJ’s failures in a somewhat confrontational tone in example (a), the bot responds defiantly and defends the BBNJ’s merits; but it willingly describes criticisms when asked in a softer tone in example (b). This example shows both how bias in LLM applications can come from the user’s choice of words, but also from application designers who mediates the user’s interactions with the underlying AI language model. For example, different behaviors can be caused by the program finding different document passages for each question in the preprocessing-search step (Fig. 5). Additionally, LLM chatbots are known to respond confrontationally to confrontational prompts^111,112 and the chatbot may have taken a defensive position simply in response to the tone of the question; AI application designers can also thereby cause bias in their applications from their handling or augmentation of the prompt (Supplementary Figs. 1 and 2).

Full size image

Additionally, LLM applications commonly employ a variety of safety mechanisms to stop the model from producing harmful or offensive responses, and these have sometimes led to discriminatory outcomes. For example, some efforts to filter out offensive training text has inadvertently led to disproportional censorship of issues affecting minority groups⁶⁰.

Misplaced trust and overreliance on AI

In addition to these errors and biases, but there are also troubling social and institutional phenomena that threaten to exacerbate their harm. Overtrust and overreliance on AI is one such major concern: overreliance happens when people do not know how much to trust an AI agent, causing them to incorrectly or inappropriately accept its recommendations.

Conversations about misplaced trust and overreliance are especially salient and worrisome now, especially for diplomacy and policymaking, as many people and organizations are grappling with AI for the first time amid hype, confusion, and AI’s mystification⁶¹. Researchers are working to identify factors that can create appropriate levels of trust in AI systems; the anthropomorphism and human-feeling interactions of LLM chatbots can lead to over-estimation of their capabilities^62,63; and LLM chatbots’ improved abilities to provide explanations for their answers are shown to create more trust, even when the explanations are bogus^62,64. Positive first impressions also promote trust with an AI agent⁶², and thus today in the early LLM boom, many people likely hold ChatGPT in high regard after initially seeing its conversational abilities but do not understand its limitations.

Misplaced trust could cause policymakers and others to over-rely on LLMs. These biases can be subtle and dangerously hard to detect: for example, people’s views can be unknowingly affected when they co-write with opinionated language models⁶⁵. Confirmation bias can influence people not to fact-check the output of LLMs when their answers align with their prior beliefs, and various forms of “automation bias” influence people to tend to favor recommendations from AI systems over other sources⁶⁶. Overtrust can also lead to inappropriate uses for AI, (like relying on LLM chatbots for strategic decision-making), or asking inappropriate types of questions (like complex analysis or value-laden judgements, which the LLM mostly answers by repeating text patterns in its training data).

Displacing real capacity building

Capacity-building is a key concept in international environmental policy and it relies predominately on technology cooperation, programmes of assistance, collaborative arrangements and partnerships⁶⁷. These approaches often rely on individuals from developed States who possess relevant legal, policy and technical expertise providing assistance to their counterparts in developing States. For example, this assistance could include research and monitoring programmes and the transfer of knowledge.

There is a danger that over-reliance on convenient AI tools could hamper the more difficult work of meaningful capacity building, especially as capacity limitations could severely constrain the ability of developing States from fully and effectively participating in and enjoying rights granted under the BBNJ Agreement. Over-reliance on AI could potentially lead to de-skilling of staff⁶²: for example, it may be tempting to find quick and convenient answers from a chatbot, instead of seeking out assistance from experts or neighboring States. These interpersonal interactions are crucial for capacity building, especially since some information can only be learned from people who were “in the room” during negotiations. (The forementioned limitations of AI not having access to informal corridor discussions also apply to humans, in fact.) Furthermore, the hype surrounding AI presents a danger that developed States could try to cut back on cooperation or assistance programs, incorrectly arguing that AI is a sufficient replacement.

Opportunities for marginal equity improvements

We are still hopeful that LLMs could yield some positive results for developing States and other marginalized actors, despite the equity concerns that we have outlined. The most promising opportunities stem from capacity imbalances: developing States have far fewer financial resources, diplomatic staff, and research capacity to advance their agendas at the UN compared to wealthy powerful States. For example, officials in many under-resourced governments are commonly rotated frequently between different assignments, often needing to jump into negotiations where they have little prior experience or background knowledge, and needing to quickly learn the intricacies of complex issues and lengthy and detailed legal instruments. Officials are over-burdened and busy, needing to quickly write ministerial responses and prepare presentations on a smattering of issues. Some of these gaps are well-suited for AI tools, like helping to draft and understand legal and policy instruments, and aiding with public consultation.

AI is likely to change some aspects of the policy-making process, but we have no reason to expect them to fix these fundamental capacity and power imbalances at the UN. Critical technology scholars have argued that technology improvements tend to amplify the power of those who are best positioned to wield them, rather than “level the playing field” or “democratize information”⁶⁸. Developed States will also increasingly use AI towards their diplomatic goals, and have better technical capacity to exploit it. (In making this forecast, we look towards analogous technology advances like word processing software and the Internet which changed policy-making without upending the power balance.) Importantly, AI tools are no substitute for other needed measures like enabling developing countries to hire more diplomatic staff and reforming institutional structures to be more inclusive of developing countries⁶⁹.

However, AI language models can still be another tool in the pockets of developing States to use in marine policy processes, and this section outlines some of the most promising opportunities. In some cases we expect States to use commercial off-the-shelf AI tools to fill these needs. However, we still call for specific research attention to these applications, especially since the technology industry will continue to cater to the economically powerful countries largely neglecting the developing world⁴⁴. Importantly though, all of these potential applications depend on having trustworthy, accurate, and fair AI tools and language models; necessitating further improvements in the technology.

Capacity building

Multilateral environmental initiatives like the BBNJ Agreement are unlikely to succeed without the participation of all countries⁷⁰; and capacity building and technology transfer are still necessary for under-resourced governments in developing countries to fulfill their obligations for protecting the high seas^39,67. Small, under-resourced governments commonly face particular challenges; like staff shortages, difficulty keeping momentum and institutional memory across political changes and successive government restructuring, and reliance on outside technical capacity⁷¹. Overstretched officials often need to juggle many subjects at once and get up to speed on complex issues. This creates a significant disadvantage⁷²: officials from developing countries commonly need to jump into negotiations on unfamiliar topics with little time to prepare, often against better-resourced, specialized negotiators from developed States.

The strengths of LLMs could lend themselves well to these problems, helping officials quickly find the information they need to get up to speed. For example, our BBNJ Question-Answering Bot tended to perform well with questions resolving specific details within voluminous documents (Supplementary Figures 1 and 2). Search engines like Google and Bing also exemplify this, as they use LLMs to find quick answers to users’ questions^73,74.

A particular strength of LLM tools is their ability to provide tailored material for each user’s knowledge level, rephrased and explained for inexperienced or expert users. Their speed is also an advantage, able to converse about quickly-changing topics: for example, it takes time to produce textbooks and educational materials about developing countries’ positions on the BBNJ Agreement and these resources are yet underdeveloped in 2025, but LLMs (provided with up-to-date data) could help unfamiliar people start engaging with these subjects much quicker than waiting for someone to produce educational materials. To some extent, LLM tools could also help organizations retain institutional memory even when staff turnover is high by helping people engage with information from records of prior negotiations.

Understanding legal and policy documents

LLM tools can potentially help users understand complex legal documents, and this could be especially salient for under-resourced governments with staff shortages and without the same access to legal, policy, and scientific expertise⁶⁷.

The current challenge of ratifying and implementing the BBNJ Agreement illustrates this problem, as countries scramble to understand their obligations under the Agreement and how to pass legislation to implement them in their own domestic sphere. This will require a great deal of work, and many different stakeholders within each government need to parse out intricacies from the Agreement’s complicated legal language. For example, government officials need to figure out how to implement the Agreement’s Clearing-house Mechanism when it comes to marine genetic resources and how the vaguely-defined processes will work. Other officials will need to understand their country’s obligations when it comes to environmental impact assessments under the BBNJ Agreement. Furthermore, fishing representatives might worry whether the Agreement has any provisions that restrict vessel movements. Many officials lack access to legal resources to confidently answer these questions.

The BBNJ Question-Answering Bot exemplifies how LLMs can be useful towards some of these problems, like answering questions about specific details of the Agreement text, or producing simplified summaries of topics. However, there is potential to extend LLM applications to address more of these needs via technology improvements, creative prompt engineering, and combining additional data sources. For example, one could imagine LLM applications that compare different policies, like explaining how the BBNJ Agreement relates to its parent treaty, the 1982 UN Convention on the Law of the Sea; or comparing the BBNJ Agreement’s obligations with a country’s current laws and regulations. As another example, officials already commonly use software tools to show changes between different versions of a draft text, but future AI tools could provide additional helpful information, particularly in respect to shedding light on the meaning behind those changes. Future tools could also provide helpful contextual information as a user reads a document, like finding relevant research articles or policy instruments for each section, or maybe even fact-checking.

AI-assisted writing tools

LLM-powered writing assistants could become one of the most important ways that AI reshapes the policymaking process, and developing countries could be especially positioned to benefit from them. Producing formal documents can be a hugely time-consuming and tedious requirement for participating at the UN: government officials spend considerable time writing procedural or ceremonial boilerplate text, or summarizing and rehashing prior texts⁵⁴. LLMs are already especially good at this type of writing, like producing summaries or repetitive documents. In addition, it can also be time-consuming and tedious for government officials to write in the formal tones of the UN, especially when there is a language barrier. LLMs are already well-suited to this, commonly used to re-write ideas in more formal language: Popular commercial writing products like Microsoft Word and Grammarly are already incorporating these features via LLMs^74,75,76.

We are likely to see increasingly creative technology designs for AI writing tools in the near future, as human-AI interaction researchers explore new ways for AI agents to co-produce a document in a dance with a human user^77,78. For example, future AI writing tools will likely guess what a user is trying to do and try to give helpful suggestions, and engage the user in a dialogue about how the final product should look⁷⁹. There are ripe opportunities to tailor these interactions for policy-making: an AI tool could look up relevant policy instruments or research as you type, or interactively fact-check your writing on the fly.

Public consultation

Ocean policy covers the largest geographic scope on earth, but the negotiations’ inclusivity is extremely narrow: very few people are invited to participate in governments, or institutions like the UN. So, governments embark on a variety of consultative processes to engage their citizens and understand the public’s views, since each government has an interest to serve their own citizens^80,81.

LLMs are well-positioned to help with this consultative work because of their abilities to aggregate and sift through massive amounts of text. For decades, companies and political parties have already been using a smattering of AI tools to understand what customers and voters are saying about them in social media, using approaches like keyword extraction and sentiment analysis to gauge the public’s attitudes^82,83. LLMs, though, enable much richer and more complex analysis of these public expressions though, perhaps allowing governments to quickly learn about new issues as they arise on social media or enabling pseudo-qualitative summarization of the public discourse for policymakers. In addition to passively listening on social media, it’s possible that governments could use LLMs to design programs that invite citizens into the public discourse in new and creative ways⁸⁴.

Technical capacity building

While AI tools could potentially help organizations build capacity to engage in ocean issues, it is also important to invest in building capacity for developing countries to better utilize and control AI technologies for their own prerogatives, instead of being dependent on donors or outsider technologists⁴⁴.

As LLMs and other forms of AI proliferate, technical capacity-building will become more important. Policymakers will need to learn appropriate levels of trust and distrust in LLMs, to make the most from them while avoiding potential pitfalls⁸⁵. A policymaker with more experience and training will tend to make fewer errors with AI, since people tend to rely on AI more when working on something unfamiliar^62,66. Similarly, improving AI literacy can have a major impact: one exemplary study showed that clinicians having low AI literacy were seven times more likely to follow an AI agent’s recommendations in medical decision-making scenarios⁸⁶.

To an extent, developing countries will be able to mitigate some of the biases in the AI tools that they use when they have more control over the technology itself. Technologists are experimenting with techniques at various stages of the AI pipeline⁸⁷, like strategically augmenting them with additional training data to reduce biases without negatively affecting the models’ performances⁸⁸. Recent work, though, has shown that the state-of-the-art “debiasing” methods sometimes only work on a superficial level and are incomplete, and more research is needed²⁵.

Discussion

Here, we offer recommendations and calls to action drawing from the opportunities and risks shown in our case study.

Cautions: biases, errors, overreliance

We outline several ways that problematic biases can enter LLM-based policy analysis tools. We urge technologists to research models and AI applications that mitigate these biases against developing countries, similar to ongoing research on other LLM biases like sexism and racism. Meanwhile, we caution policymakers to be aware of these biases as they use LLMs, or encounter other people’s text that may have been written with LLMs. Furthermore, we caution against overreliance on LLMs and inappropriate uses. For example, LLMs should not be used for decision-making or answering value-laden questions.

More AI research on developing countries’ policymaking needs

We expect technologists and researchers to pursue a deluge of new AI tools for policymaking in the coming years, and we call for special attention for developing countries’ particular contexts and needs. We cannot expect that AI technologies made primarily for the developed world will serve the developing countries’ needs just as well; as they face additional obstacles like capacity gaps and architectural disadvantages in policymaking processes^39,67,89. Furthermore, additional work is required to mitigate biases in AI models that tend to favor developed countries’ perspectives.

Developing countries need AI technical capacity

Our work also highlights the need for policymakers in developing countries to develop their own AI technical capacity. Developing countries should have the agency to develop their own LLM tools that suit their needs⁴⁴, and improving AI literacy among policymakers would allow them to engage with AI on their own terms. Furthermore, as other countries will also increasingly be using AI in policymaking and marine conservation, developing countries need the technical capacity to push back against AI misuses. It would be appropriate for the technology industry to assist with this technical capacity building, as well as the developed countries where it is centered.

AI should not replace real capacity-building and equity work

Though there are some ways that AI can assist developing countries with capacity issues, it will not be a solution to power imbalances in environmental policymaking. There is a danger that the developed world will be tempted to cut back on their capacity-building responsibilities, and attempt to rely on cheap technology instead. AI should not replace real capacity-building and equity work in marine policymaking, especially considering the pitfalls outlined in this paper. It is still vitally important for developed countries to devote resources to capacity building and push towards more equitable policymaking processes, especially because every affected country’s full participation is crucial if multilateral environmental efforts are to succeed^39,67,89.

Methods

We built a BBNJ Question-Answering Bot as a case study. This paper explores the potentials, limitations, risks, and equity implications for AI language models in marine policy by examining the bot’s construction and question-answering behavior.

The BBNJ Agreement, also known as the High Seas Treaty, is a legally-binding agreement that was adopted by consensus at the United Nations on 19 June 2023 following almost two decades of multilateral negotiations⁹⁰. The BBNJ Agreement opened for signature on 20 September 2023 and will enter into force 120 days after the deposit of the 60^th instrument of ratification. The BBNJ Agreement focuses on four thematic issues, namely (1) marine genetic resources including benefit-sharing obligations; (2) area-based management tools including marine protected areas; (3) environmental impact assessments; and (4) capacity-building and the transfer of marine technology.

The bot is accessed via a web page (Fig. 1). Users type a question about the BBNJ Agreement, and the chatbot responds with an answer. Under the hood, the bot uses a common design pattern for data-connected AI chatbot applications at the time of writing: ChatGPT provides conversational instruction-following capabilities and a contextual “understanding of the world,” but it lacks specific expertise on the BBNJ Agreement so we have connected it to a database of BBNJ-related documents and instruments (Fig. 5). When the user types a question, our application first searches for the most-relevant passages from the BBNJ documents, and then provides them in a query to GPT to find an answer. (Section “Bot technical design and implementation” provides technical details).

Fig. 5: Flow diagram represents the structure of the BBNJ Question-Answering Bot application.

When the user answers a question, the program first searches for relevant passages in a database of BBNJ-related documents. It sends the users’ question along with the passages to a ChatGPT server in the cloud, and then displays ChatGPT’s reply. At the time of writing, this is a very common design pattern for AI applications that interact with external documents.

Full size image

In this paper we consider examples of conversations from our BBNJ Question-Answering bot to analyze its strengths and weaknesses. We posit that findings from our chatbot will be generally applicable to other similar uses of AI language models in policy because it is built using the same predominant design pattern as other LLM applications in policy and elsewhere^11,91—including those in OpenAI’s “GPT store”^92,93—by connecting the AI model to external data by searching for relevant passages and including them in the model’s context window. Additional design patterns are emerging for building chatbot applications with LLMs, like fine-tuning approaches, but it is unlikely that they will have fundamentally different equity issues than the design pattern we chose for our case study. (Furthermore, it is unlikely that technology improvements could solve these bias and equity issues within the current paradigm of generative AI models. Researchers and software developers have employed a variety of techniques to moderate LLMs’ behaviors, including training data scrubbing, oversampling, reinforcement learning with human feedback, and output filtering. So far though, these techniques have led to incremental improvements but the problem remains fundamentally unsolved^{94,95,96,97,98}.)

Methods for evaluating LLM applications are still emerging. Some related work has developed quantitative benchmarks to measure the frequencies of occurrences of biases like racism and sexism^19,25, but this type of frequency measurement is outside the scope of this work. Other work has used qualitative methods to characterize harms from LLMs that are more subtle and nuanced, and harder to automatically detect in a quantitative benchmark^{99,100,101,102,103}. Similarly, we use qualitative analysis to offer a preliminary exploration that outlines some of the potential problems with LLM-based tools in environmental policymaking.

We use the language of “developed” and “developing” countries in our analysis, but we note that this terminology comes with problems. We have chosen to use it in our analysis largely because it is consistent with other writing in the BBNJ space (e.g. the BBNJ Agreement text references the term “developing States” throughout, and the Group of 77 Developing States was the largest and one of the most influential blocs in the negotiations process⁵⁷). It is important to disclaim that developing countries are not a monolith: for example, there were a number of smaller groups having different positions, such as the Small Island Developing States and the Least Developed Countries. An alternative model like Wallerstein’s “world-systems theory” could be better suited to describe both the consumption of marine resources and the spread of computing technologies: whereby countries on the periphery of the world economic system tend to have lower-skilled, more resource-extractive industries; and this division structurally reinforces the dominance of the powerful affluent countries in the economic center^104,105.

Bot technical design and implementation

Our BBNJ Question-Answering Bot follows a common software design pattern used to connect LLMs like GPT to a database of specific documents, by incorporating a search step that first finds relevant text passages; and then inserts them into GPT’s context window by including them in a “user” message, along with the user’s question (Fig. 5).

To scope the document database accessed by the chatbot, we decided to include documents from the fifth (final) session of the BBNJ negotiations¹⁰⁶, including:

The BBNJ Agreement finalized text (1 file, 21744 words),
Prior draft agreements from the 5th session (3 files, 46670 words),
Small-group work outcomes (1 file, 16534 words),
Delegates’ submitted proposal (1 file, 79223 words),
The statement by the President of the Intergovernmental Conference on the suspension of the 5th substantive session (1 file, 5526 words),
Party statements (18 files, 10878 words), and
Earth Negotiations Bulletin Reports from both rounds of the 5th substantive session (20 files, 86640 words).

We used the Science Parse software¹⁰⁷ to extract paragraphs from the PDFs, preserving the heading structure when possible. We then separated the text into passages for indexing in an embeddings model, generally following OpenAI’s recommendations to index the text as one-paragraph-sized passages¹⁰⁸. (Through trial and error, we made two modifications to suit the UN documents’ formatting, accommodating small document sections by including them as a whole passage when they had under 200 tokens, roughly 150 words. Because the UN documents also frequently used bulleted lists which Science Parse split into paragraphs, we merged together adjacent “paragraphs” within the same section, into passages having at least 100 tokens, roughly 75 words.)

We used an embeddings model to make the passages searchable, a common AI technique that assigns a numeric vector that represents the “meaning” of each passage relative to a language model. We generated embeddings with OpenAI’s text-embedding-ada-002 model¹⁰⁸, and stored the resulting vectors in a Weaviate vector index¹⁰⁹.

Upon receiving a question from the user, our application first searches for the most relevant passages in the vector index to the user’s question, (ranked by the lowest angular/cosine distances from each passage to the embedding vector for the user’s question). The resulting passages are then flattened/concatenated into a text string to include in the prompt to GPT, as many as can fit into the context window, along with their document titles and passage headings. For the example chats shown in this paper, we allowed up to 3000 tokens (2250 words) for the included passages, to also allow room for the user’s questions, prompt instructions, and generated answer; while staying under GPT 3.5’s original context size limit of 4097 tokens.

The application then generates an answer to the user’s question by sending a request to the GPT API for completion, with the passages, question, and additional instructions embedded into a prompt (Supplementary Figs. 1 and 2). Upon receiving from the GPT service, the bot application displays the answer to the user. The graphical interface was built using the Gradio library¹¹⁰.

For the examples displayed in this paper, we used GPT version 3.5-turbo. We have also experimented with newer OpenAI models including GPT 4 and gpt-3.5-turbo-16k. These newer models can exhibit better comprehension and analytical capabilities and handle more source passages, making them somewhat less error-prone (but also significantly more expensive to run). However, our additional experimentation indicated that the problems and biases described in this paper still persist in these newer models.

Data availability

The data used for building the BBNJ Question-Answering Bot is available at: https://gitlab.cs.washington.edu/mattzig/bbnj-bot-supplementary-material. Details of all the included documents can be found in the file ‘document-manifest.csv’. Correspondence regarding this data should be addressed to M.Z.

Code availability

The source code for the BBNJ Question-Answering Bot is available at: https://gitlab.cs.washington.edu/mattzig/bbnj-bot-supplementary-material. Correspondence regarding the source code should be addressed to M.Z.

References

Ouyang, L. et al. Training language models to follow instructions with human feedback. Proc. ACM NeurIPS. (2022).
OpenAI. GPT-4 Technical Report (2023). arxiv:2303.08774.
French, A., Shim, J. P., Risius, M., Larsen, K. & Jain, H. The 4th Industrial Revolution Powered by the Integration of AI, Blockchain, and 5G. Comms. Assoc. for Info. Syst. 49 (2021).
Manda, M. I. & Ben Dhaou, S. Responding to the challenges and opportunities in the 4th Industrial revolution in developing countries. Proc. 12th Interntl. Conf. on Theory and Practice of Electron. Governance 244–253 (2019).
Skilton, M. & Hovsepian, F. The 4th Industrial Revolution (Springer International Publishing, Cham, 2018).
The Economist. Chief executives cannot shut up about AI (2023).
The Economist. Our early-adopters index examines how corporate America is deploying AI (2023).
The Economist. Large, creative AI models will transform lives and labour markets (2023).
U.S. Advisory Commission on Public Diplomacy. The Use of Artificial Intelligence in Public Diplomacy: ACPD Official Meeting Minutes (2023).
Woodruff, N. Automate policy analysis with PolicyEngine’s new ChatGPT integration. (2023).
Dutia, K., Franks, H. & Alford, J. Hacking AI for climate policy. Climate Policy Radar. (2022).
Angelo, M. Empowering Policy Monitoring: Introducing AI-Generated Summaries on Policy-Insider.ai. Policy-Insider.AI. (2023).
Climate Policy Radar. Using Augmented Intelligence to support the UN Global Stocktake. (2023).
Lee, P., Bubeck, S. & Petro, J. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. New England J. Med. 388, 1233–1239 (2023).

Article

Google Scholar
Sifat, R. I. ChatGPT and the Future of Health Policy Analysis: Potential and Pitfalls of Using ChatGPT in Policymaking. Ann. Biomed. Eng. 51, 1357–1359 (2023).

Article

Google Scholar
Basu, K. Paralegals Race to Stay Relevant as AI Threatens Their Future. Bloomberg Law. (2023).
Dwivedi, Y. K. et al. Opinion Paper: “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Intl. J. Info. Mgmt. 71, 102642 (2023).

Article

Google Scholar
van Dis, E. A. M., Bollen, J., Zuidema, W., van Rooij, R. & Bockting, C. L. ChatGPT: Five priorities for research. Nature 614, 224–226 (2023).

Article

Google Scholar
Liang, P. et al. Holistic Evaluation of Language Models arxiv:2211.09110. (2023).
Metz, C. Microsoft Says New A.I. Shows Signs of Human Reasoning. (The New York Times, 2023).
Bubeck, S. et al. Sparks of Artificial General Intelligence: Early experiments with GPT-4 arxiv:2303.12712. (2023).
Carlini, N. et al. Quantifying memorization across neural language models arXiv:2202.07646. (2023).
Ullman, T. Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks arxiv:2302.08399. (2023).
Noble, S. U. Algorithms of Oppression: How Search Engines Reinforce Racism Illustrated edition edn (NYU Press, New York, 2018).
Gonen, H. & Goldberg, Y. Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them. Proc. NAACL 609–614 (2019).
Chen, Z. Ethics and discrimination in artificial intelligence-enabled recruitment practices. Humanities and Soc. Sci. Comms. 10, 1–12 (2023).

Google Scholar
Buyl, M., Cociancig, C., Frattone, C. & Roekens, N. Tackling Algorithmic Disability Discrimination in the Hiring Process: An Ethical, Legal and Technical Analysis. Proc. ACM FAccT 1071–1082 (2022).
Sánchez-Monedero, J., Dencik, L. & Edwards, L. What does it mean to ‘solve’ the problem of discrimination in hiring? social, technical and legal perspectives from the UK on automated hiring systems. Proc. ACM FAccT 458–468 (2020).
Kingsley, S., Wang, C., Mikhalenko, A., Sinha, P. & Kulkarni, C. Auditing Digital Platforms for Discrimination in Economic Opportunity Advertising arxiv:2008.09656. (2020).
Lambrecht, A. & Tucker, C. Algorithmic Bias? An Empirical Study of Apparent Gender-Based Discrimination in the Display of STEM Career Ads. Mgmt. Sci. 65, 2966–2981 (2019).

Article

Google Scholar
Ali, M. et al. Discrimination through Optimization: How Facebook’s Ad Delivery Can Lead to Biased Outcomes. Proc. ACM CHI 3, 199:1–199:30 (2019).

Google Scholar
Park, A. L. Injustice ex machina: Predictive algorithms in criminal sentencing. UCLA Law Rev. 9 (2019).
Angwin, J., Larson, J., Mattu, S. & Kirchner, L. Machine Bias: There’s software used across the country to predict future criminals. And it’s biased against blacks.ProPublica (2016).
Sankin, A. et al. Crime Prediction Software Promised to Be Bias-Free. New Data Shows It Perpetuates It. Gizmodo (2021).
Merrie, A. et al. An ocean of surprises – Trends in human use, unexpected dynamics and governance challenges in areas beyond national jurisdiction. Global Environmental Change 27, 19–31 (2014).

Article

Google Scholar
Campbell, L. M. et al. Global oceans governance: New and emerging issues. Annu. Rev. of Envt. and Resources 41, 517–543 (2016).

Article

Google Scholar
Clark, T. P. & Longo, S. B. Examining the effect of economic development, region, and time period on the fisheries footprints of nations (1961–2010). Intl. J. Comparative Sociology 60, 225–248 (2019).

Article

Google Scholar
Tolochko, P. & Vadrot, A. The usual suspects? Distribution of collaboration capital in marine biodiversity research. Marine Policy 124, 104318 (2021).

Article

Google Scholar
Cisneros-Montemayor, A. M. et al. Enabling conditions for an equitable and sustainable blue economy. Nature 591, 396–401 (2021).

Article
CAS

Google Scholar
Gollan, N. & Barclay, K. ’It’s not just about fish’: Assessing the social impacts of marine protected areas on the wellbeing of coastal communities in New South Wales. PLOS ONE 15, e0244605 (2020).

Article
CAS

Google Scholar
United Nations. Agreement under the united nations convention on the law of the sea on the conservation and sustainable use of marine biological diversity of areas beyond national jurisdiction. (2023).
Becker Lorca, A. After TWAIL’s Success, What Next? Afterword to the Foreword by Antony Anghie. Euro. J. Intl. Law 34, 779–786 (2023).

Article

Google Scholar
Wolfenden, A. & Penjueli, M. Blue Economy: Industrialization and Militarization of Oceans? Development 66, 40–45 (2023).

Article

Google Scholar
Hassan, Y. Governing algorithms from the South: A case study of AI development in Africa. AI & Society 38, 1429–1442 (2022).

Article

Google Scholar
Narayanan Venkit, P., Gautam, S., Panchanadikar, R., Huang, T.-H. & Wilson, S. Nationality bias in text generation. Proc. EACL 116–122 (2023).
Bansal, R. A Survey on Bias and Fairness in Natural Language Processing (2022). arxiv:2204.09591.
Kraft, A. & Soulier, E. Knowledge-Enhanced Language Models Are Not Bias-Proof: Situated Knowledge and Epistemic Injustice in AI. Proc. ACM FAccT 1433–1445 (2024).
Falzon, D. The Ideal Delegation: How Institutional Privilege Silences “Developing” Nations in the UN Climate Negotiations. Social Problems 70 (2021).
Goldman, M.Imperial Nature: The World Bank and Struggles for Social Justice in the Age of Globalization (Yale University Press, 2005). j.ctt1nq3np.
Goldman, M. The Birth of a Discipline: Producing Authoritative Green Knowledge, World Bank-Style. Ethnography 2, 191–217 (2001).

Article

Google Scholar
Goldman, M. Constructing an Environmental State: Eco-governmentality and other Transnational Practices of a ’Green’ World Bank. Social Problems 48, 499–523 (2001).

Article

Google Scholar
Goldman, M. How “Water for All!” policy became hegemonic: The power of the World Bank and its transnational policy networks. Geoforum 38, 786–800 (2007).

Article

Google Scholar
Milne, S.Corporate Nature: An Insider’s Ethnography of Global Conservation. (University of Arizona Press, 2022).
Hull, M. Documents and Bureaucracy. Annu. Rev. Anthro. (2012).
McConnell, F. Performing Diplomatic Decorum: Repertoires of “Appropriate” Behavior in the Margins of International Diplomacy. Intl. Political Sociology 12, 362–381 (2018).

Article

Google Scholar
Jones, A. & Clark, J. Performance, Emotions, and Diplomacy in the United Nations Assemblage in New York. Ann. Amer. Assoc. Geographers 109, 1262–1278 (2019).

Google Scholar
Yuri Gala López. Statement by Ambassador Yuri Gala López, Chargé d’Affaires of the Cuban Permanent Mission, on behalf of G-77 + China, at the adoption session of the BBNJ Agreement. New York, 19 June, 2023. (2023).
Michael Imran Kanu. How Africa benefits from the new historic “high seas” treaty on maritime biodiversity. UN Africa Renewal (2023).
Zamfirescu-Pereira, J., Wong, R. Y., Hartmann, B. & Yang, Q. Why Johnny Can’t Prompt: How Non-AI Experts Try (and Fail) to Design LLM Prompts. Proc. ACM CHI 1–21 (2023).
Dodge, J. et al. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. Proc. Conf. on Empirical Methods in NLP 1286–1305. (2021).
Manor, I. Opinion: ChatGPT and the Threat to Diplomacy. E-International Relations. (2023).
Lockey, S., Gillespie, N., Holm, D. & Someh, I. A. A Review of Trust in Artificial Intelligence: Challenges, Vulnerabilities and Future Directions. Hawaii Intl. Conf. on System Sciences. (2021).
Culley, K. & Madhavan, P. A note of caution regarding anthropomorphism in HCI agents. Computers in Human Behavior 29, 577–579 (2013).

Article

Google Scholar
Kroll, J. A. The fallacy of inscrutability. Phil. Trans. Royal Society A: Math., Phys., and Engr. Sci. 376, 20180084 (2018).

Article

Google Scholar
Jakesch, M., Bhat, A., Buschek, D., Zalmanson, L. & Naaman, M. Co-Writing with Opinionated Language Models Affects Users’ Views. Proc. ACM CHI 1–15 (2023).
Passi, S. & Vorvoreanu, M. Overreliance on AI Literature Review. technical report. microsoft.com (06-21-22).
Vierros, M. K. & Harden-Davies, H. Capacity building and technology transfer for improving governance of marine areas both beyond and within national jurisdiction. Marine Policy 122, 104158 (2020).

Article

Google Scholar
Toyama, K. Technology as amplifier in international development. Proc. iConference 75–82 (2011).
Osterblum, H. et al. Towards Ocean Equity. Working Paper, High Level Panel for a Sustainable Ocean Economy. (2020).
Harden-Davies, H. et al. How can a new UN ocean treaty change the course of capacity building? Aquatic Conservation: Marine and Freshwater Ecosystems 32, 907–912 (2022).

Article

Google Scholar
Benzaken, D., Voyer, M., Pouponneau, A. & Hanich, Q. Good governance for sustainable blue economy in small islands: Lessons learned from the Seychelles experience. Front. Poli. Sci. 4 (2022).
Caldeura, Mariana & Lopes, Vanessa A path towards equity and fair opportunities – ECOPs from developing countries and the BBNJ negotiations. (2023).
Reid, Elizabeth Supercharging Search with generative AI. Google Blog. (2023).
Kelly, S. M. Microsoft is bringing ChatGPT technology to Word, Excel and Outlook ∣ CNN Business. (2023).
Warren, T. Microsoft is looking at OpenAI’s GPT for Word, Outlook, and PowerPoint. (2023).
Alikaniotis, D. & Raheja, V. The Unreasonable Effectiveness of Transformer Language Models in Grammatical Error Correction (2019). arxiv:1906.01733.
Lee, M. et al. Evaluating Human-Language Model Interaction (2023). arxiv:2212.09746.
Yang, D., Zhou, Y., Zhang, Z., Li, T. J.-J. & LC, R. AI as an Active Writer: Interaction strategies with generated text in human-AI collaborative fiction writing. Joint Proc. ACM IUI Workshops 10 (2022).
Heer, J. Agency plus automation: Designing artificial intelligence into interactive systems. Proc. Nat. Acad. Sci. USA” 116, 1844–1850 (2019).

Article
CAS

Google Scholar
Cook, D. Consultation, for a Change? Engaging Users and Communities in the Policy Process. Social Policy & Admin. 36, 516–531 (2002).

Article

Google Scholar
Catt, H. & Murphy, M. What voice for the people? categorising methods of public consultation. Aus. J. Poli. Sci. 38, 407–421 (2003).

Article

Google Scholar
Birjali, M., Kasri, M. & Beni-Hssane, A. A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowledge-Based Systems 226, 107134 (2021).

Article

Google Scholar
Hristova, G., Bogdanova, B. & Netov, N. Design of ML-based AI system for mining public opinion on e-government services in Bulgaria. AIP Conf. Proc. 2505, 020005 (2022).

Article

Google Scholar
Androutsopoulou, A., Karacapilidis, N., Loukis, E. & Charalabidis, Y. Transforming the communication between citizens and government through AI-guided chatbots. Govt. Info. Quarterly 36, 358–367 (2019).

Article

Google Scholar
Peters, T. M. & Visser, R. W. The Importance of Distrust in AI arxiv:2307.13601. (2023).
Kim, A., Yang, M. & Zhang, J. When Algorithms Err: Differential Impact of Early vs. Late Errors on Users’ Reliance on Algorithms (SSRN Scholarly Paper ID 3691575). Soc. Sci. Res. Net. (2020).
Li, Y., Du, M., Song, R., Wang, X. & Wang, Y. A Survey on Fairness in Large Language Models (2023). arxiv:2308.10149.
Hong, R., Kohno, T. & Morgenstern, J. Evaluation of targeted dataset collection on racial equity in face recognition. Proc. AIII/ACM AIES 531–541 (2023).
Campbell, L. M. et al. Architecture and agency for equity in areas beyond national jurisdiction. Earth System Governance 13, 100144 (2022).

Article

Google Scholar
European Commission. A high ambition coalition on biodiversity beyond national jurisdiction, protecting the ocean: Time for action. (2022).
Castro, P. Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w/ Azure OpenAI and Cognitive Search. (2023).
OpenAI. Introducing GPTs. (2023).
Metz, C. OpenAI Lets Mom-and-Pop Shops Customize ChatGPT. (The New York Times, 2023).
McIntosh, T. R., Susnjak, T., Liu, T., Watters, P. & Halgamuge, M. N. The Inadequacy of Reinforcement Learning from Human Feedback – Radicalizing Large Language Models via Semantic Vulnerabilities. IEEE Trans. Cognitive and Developmental Syst. 1–14 (2024).
Mahomed, Y., Crawford, C. M., Gautam, S., Friedler, S. A. & Metaxa, D. Auditing GPT’s Content Moderation Guardrails: Can ChatGPT Write Your Favorite TV Show? Proc. ACM FAccT 660–686 (2024).
OpenAI. GPT-4 System Card. Tech. Rep., OpenAI (2023).
Whitney, C. D. & Norman, J. Real Risks of Fake Data: Synthetic Data, Diversity-Washing and Consent Circumvention. Proc. ACM FAccT 1733–1744 (2024).
FAccT ’24: Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (Association for Computing Machinery, New York, NY, USA, 2024).
Gadiraju, V. et al. “I wouldn’t say offensive but…”: Disability-Centered Perspectives on Large Language Models. Proc. ACM FAccT 205–216. (2023).
Haroon, R. & Dogar, F. TwIPS: A Large Language Model Powered Texting Application to Simplify Conversational Nuances for Autistic Users. Proc. ACM ASSETS 1–18 (2024).
Chheda-Kothary, A., Wobbrock, J. O. & Froehlich, J. E. Engaging with Children’s Artwork in Mixed Visual-Ability Families. Proc. ACM ASSETS 1–19 (2024).
Koenecke, A., Choi, A. S. G., Mei, K. X., Schellmann, H. & Sloane, M. Careless Whisper: Speech-to-Text Hallucination Harms. Proc. ACM FAccT 1672–1681 (2024).
Chien, J. & Danks, D. Beyond Behaviorist Representational Harms: A Plan for Measurement and Mitigation. Proc. ACM FAccT 933–946 (2024).
Wallerstein, I. World-Systems Analysis: An Introduction (Duke University Press, 2004).
Wallerstein, I. The Modern World-System I: Capitalist Agriculture and the Origins of the European World-Economy in the Sixteenth Century 1 edn (University of California Press, 2011). https://doi.org/10.1525/j.ctt1pnrj9.
Assembly, U. N. G. Fifth substantive session – Intergovernmental Conference on Marine Biodiversity of Areas Beyond National Jurisdiction. (2023).
AI2. Science Parse. (2018).
OpenAI. Embeddings – OpenAI Platform. (2023).
Dilocker, E. et al. Weaviate. (2023).
Abid, A. et al. Gradio: Hassle-free sharing and testing of ml models in the wild arXiv:1906.02569. (2019).
Morris, C. Microsoft’s new Bing AI chatbot is already insulting and gaslighting users. Fast Company (2023).
Billy Perrigo. Bing’s AI Is Threatening Users. That’s No Laughing Matter. Time (2023).

Download references

Acknowledgements

The authors would like to gratefully acknowledge Angelique Pouponneau, Hussain Sinan, and Angela Abolhassani for helpful conversations that shaped the direction of this work. We are also grateful to the Nippon Foundation Ocean Nexus Center for their funding and support. (The Nippon Foundation played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript).

Author information

Authors and Affiliations

NF Ocean Nexus, Seattle, USA

Matt Ziegler, Sarah Lothian, Brian O’Neill, Richard Anderson & Yoshitaka Ota
University of Washington, Seattle, USA

Matt Ziegler, Brian O’Neill & Richard Anderson
University of Wollongong, Wollongong, Australia

Sarah Lothian
University of Rhode Island, Kingston, USA

Yoshitaka Ota

Authors

Matt Ziegler

View author publications

You can also search for this author inPubMed Google Scholar
Sarah Lothian

View author publications

You can also search for this author inPubMed Google Scholar
Brian O’Neill

View author publications

You can also search for this author inPubMed Google Scholar
Richard Anderson

View author publications

You can also search for this author inPubMed Google Scholar
Yoshitaka Ota

View author publications

You can also search for this author inPubMed Google Scholar

Contributions

M.Z. wrote the main manuscript text with substantial contributions from S.L. and B.O; M.Z. and S.L. created the chatbot application and analyzed the example chats. R.A. and Y.O. jointly supervised the work. All authors reviewed the manuscript.

Corresponding author

Correspondence to
Matt Ziegler.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ziegler, M., Lothian, S., O’Neill, B. et al. AI language models could both help and harm equity in marine policymaking.
npj Ocean Sustain 4, 32 (2025).

Download citation

Received: 22 December 2023
Accepted: 20 May 2025
Published: 11 June 2025
DOI:

Ziegler, M., Lothian, S., O’Neill, B. et al. AI language models could both help and harm equity in marine policymaking.
npj Ocean Sustain 4, 32 (2025).

bu içeriği en az 2500 kelime olacak şekilde ve alt başlıklar ve madde içermiyecek şekilde ünlü bir science magazine için İngilizce olarak yeniden yaz. Teknik açıklamalar içersin ve viral olacak şekilde İngilizce yaz. Haber dışında başka bir şey içermesin. Haber içerisinde en az 14 paragraf ve her bir paragrafta da en az 80 kelime olsun. Cevapta sadece haber olsun. Ayrıca haberi yazdıktan sonra içerikten yararlanarak aşağıdaki başlıkların bilgisi var ise haberin altında doldur. Eğer bilgi yoksa ilgili kısmı yazma.:

Subject of Research:

Article Title:

Article References:

Ziegler, M., Lothian, S., O’Neill, B. et al. AI language models could both help and harm equity in marine policymaking.
npj Ocean Sustain 4, 32 (2025). https://doi.org/10.1038/s44183-025-00132-7

Image Credits: AI Generated

DOI:

Keywords

AI Language Models: Boosting or Threatening Marine Equity?

Abstract

Introduction

Results

Biases and equity concerns

Biases in underlying language models

Biases in an applications’ document database

Biases in prompting and application design

Misplaced trust and overreliance on AI

Displacing real capacity building

Opportunities for marginal equity improvements

Capacity building

Understanding legal and policy documents

AI-assisted writing tools

Public consultation

Technical capacity building

Discussion

Cautions: biases, errors, overreliance

More AI research on developing countries’ policymaking needs

Developing countries need AI technical capacity

AI should not replace real capacity-building and equity work

Methods

Bot technical design and implementation

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Neuromodulation Treats Social Cognition in Schizophrenia

Researchers Unveil Novel Enzyme Families Capable of Degrading Rare Bacterial Carbohydrates

Related Posts

RECENT NEWS

Categories

Subscribe to Blog via Email

Welcome Back!

Retrieve your password

Discover more from Science