Understanding and Mitigating AI Risks: A Framework for Systemic Safety and Ethical Development

Artificial intelligence is rapidly transforming our world, promising unprecedented benefits while simultaneously introducing unforeseen challenges. Ensuring its responsible development and deployment requires a clear understanding of potential pitfalls. This research delves into the multifaceted landscape of AI risks, seeking to identify and categorize the factors that can lead to systemic harm. It explores how these risks manifest across different areas, from biased decision-making to security vulnerabilities, and examines the underlying causes that drive their emergence. By providing a structured framework for analyzing AI’s potential downsides, this work aims to inform the ongoing efforts to build safer, more ethical, and beneficial AI systems.

What factors can create systemic risks?

Systemic risks in the context of AI arise from various interconnected factors that can amplify and propagate harm throughout a broader system. One crucial factor is the entity responsible for causing the risk. This could be an AI system acting autonomously, humans intentionally or unintentionally misusing AI, or external factors influencing AI behavior. Notably, a significant portion of identified AI risks stem from decisions made and actions taken by AI systems themselves, rather than direct human intervention. This highlights the potential for AI to generate risks independently, even without malicious human intent. Furthermore, whether the risk is an expected outcome of planned actions (intentional) or an unforeseen consequence (unintentional) plays a key role. These risks are almost equally likely highlighting the challenge of predicting and mitigating unintended consequences, while also accounting for risks designed to cause harm. The stage in the AI lifecycle when these risks manifest—pre-deployment during development or post-deployment after the system is in use—also determines systemic potential by revealing which actions are hardest to contain.

Specific characteristics of AI systems themselves contribute to systemic risks. A lack of robustness, especially when AI systems encounter unforeseen circumstances or biased or incomplete data, allows failures that can have far-reaching impacts across interconnected systems relying on them. Transparency, too, is crucial; challenges in interpreting AI decision-making processes can erode trust, impede accountability, and ultimately inhibit the ability to detect and correct errors before they escalate into systemic issues. Intertwined socioeconomic factors also fuel systemic risk. The concentration of power and resources in the hands of a few entities capable of affording sophisticated AI increases societal inequality, raising the potential for biased or manipulative AI systems. Widespread deployment can displace human workers or create exploitative labor conditions, causing instability that is hard to isolate in one part of life. Further compounding these concerns, the rapid pace of AI development can outstrip governance mechanisms and ethical standards, creating failures in regulation and oversight capable of exacerbating systemic harm.

What is the methodology of this study?

The methodology employed in this study is multi-faceted, incorporating a systematic literature search, expert consultation, and a unique “best fit” framework synthesis. The overall aim was to create a comprehensive AI Risk Repository that could serve as a common frame of reference for understanding and addressing risks associated with AI. The process began with an extensive search of academic databases, including Scopus, and various preprint servers such as arXiv and SSRN, to identify relevant research, articles, reports, and documents focused on proposing new frameworks, taxonomies, or other structured classifications of risks from AI. Pre-specified rules were used to define the studies to be included in this summary, which was facilitated by the use of active learning in ASReview for faster and more effective dual title/abstract screening.

Following the initial literature search, the researchers engaged in forward and backward citation searching, along with expert consultation, to identify additional relevant materials. This involved reviewing bibliographies of selected papers and reaching out to experts in the field for recommendations. The research team then extracted information about 777 different risks from 43 documents into a “living” database. To classify these risks, the researchers adopted a “best fit framework synthesis” approach. This involved selecting existing classification systems from the identified literature and iteratively adapting them to effectively categorize the risks in the database. Grounded theory methods were used during coding to analyze the data as presented in the original sources, without interpretation.

Taxonomy Development

The synthesis process ultimately led to the development of two distinct taxonomies: a high-level Causal Taxonomy of AI Risks, which classifies risks by their causal factors (Entity, Intentionality, and Timing), and a mid-level Domain Taxonomy of AI Risks, which categorizes risks into seven domains (Discrimination & toxicity, Privacy & security, etc.) and 23 subdomains. These taxonomies were developed iteratively, with initial frameworks being tested, refined, and expanded upon based on the data extracted from the literature. The goal was to create a unified classification system that could effectively capture the diverse perspectives on AI risks and facilitate a more coordinated approach to managing these risks.

What are the characteristics of the documents included in this study?

The study included a diverse range of documents, totaling 43, comprising 17 peer-reviewed articles, 16 preprints, 6 conference papers, and 4 other reports. The majority of identified literature was recent, with almost all documents published after 2020, reflecting the rapidly evolving landscape of AI risk research. The corresponding authors were primarily based in the USA, China, the United Kingdom, and Germany, indicating a significant concentration of research efforts in these regions. Affiliations varied, with most corresponding authors representing universities, followed by industry organizations, and a smaller number from government, international, or non-government organizations. A notable trend was the prevalence of narrative reviews or “surveys” as the most common methodology, followed by systematic and scoping reviews, suggesting a need to synthesize existing literature rather than primary empirical investigation.

The included documents presented a compilation of 777 risk categories and sub-categories, demonstrating the breadth of AI risk considerations. However, two documents were excluded from later coding as they did not present distinct risk categories according to the study’s definitions. The framing of risk and AI risk varied significantly, with only a few documents explicitly defining risk. The classifications, frameworks, and taxonomies used varied terms to describe risks, including: “risks of/from AI,” “harms of AI,” “AI ethics,” “ethical issues/concerns/challenges,” “social impacts/harms,” and others, indicating a lack of standardization in terminology and scope of discussion. Considering the type of AI assessed, there was a lack of explicit definitions in most of the documents, but large language models emerged as the most frequent target of risk assessments. These characteristics of the documents highlight the heterogeneity and evolving nature of AI risk research, underscoring the need for a comprehensive repository that can accommodate and categorize the diverse perspectives and methodologies employed.

How are AI risks classified based on causal factors?

AI risks can be classified based on their underlying causal factors, providing a framework for understanding how, when, and why these risks may emerge. One such classification system, here termed the Causal Taxonomy of AI Risks, categorizes risks according to three primary dimensions: the Entity responsible for the risk, the Intent behind the action leading to the risk, and the Timing of the risk occurrence. These dimensions provide a structured approach to dissecting the origins and progression of AI-related harms. Considering these dimensions helps to differentiate for example, between risks originating from intentional malice versus unintentional errors in design, or risks that are primarily attributable to the AI versus those mainly stemming from human decisions.

The ‘Entity’ dimension identifies whether the risk is primarily caused by a human decision or action, by an AI system itself, or by some other ambiguous reason. The ‘Intent’ dimension distinguishes between risks that arise as an expected outcome of pursuing a specific goal (intentional risks) versus those that occur due to unexpected or unintended consequences (unintentional risks). A third option, ‘Other’, acknowledges that the nature of the intentionality can be unclearly specified in original categorizations of risk, for example simply arising from environmental constraints. Meanwhile, the ‘Timing’ dimension categorizes risks based on whether they occur before the AI system is deployed (pre-deployment) or after it has been trained and put into use (post-deployment). The Timing classification also has an “Other” option. This captures risks for which a specific time of occurrence is not clearly presented, acknowledging that some descriptions of risk may span durations or contexts. This classification, by detailing when and how intentional AI risks may occur, allows risk analysis to be more targeted in the development of comprehensive auditing systems.

This approach to classifying AI risks also accounts for how certain domains of risk from AI are generally presented in research and analysis. By accounting for the “Entity”, “Intent”, and “Timing” most consistently presented in discussions of risk, one can identify a degree of consistency and coherence of the general presentation of those risks in research. Identifying which factors vary greatly when researchers present and discuss a specific type of risk can help provide insight into how the nature and origin of that risk is perceived, the type of solutions which may be called for in response, and so on.

Are there clear trends present in the data regarding Entity, Intent, and Timing as they relate to each domain of AI risk?

The analysis reveals varying trends across different AI risk domains concerning Entity (the cause of the risk: Human, AI, or Other), Intent (Intentional, Unintentional, or Other), and Timing (Pre-deployment, Post-deployment, or Other). For example, risks in the Discrimination & toxicity, Misinformation, and AI system safety, failures & limitations domains are more frequently presented as caused by AI systems rather than human actions. Specific subdomains exhibit this trait more clearly, such as the “False or misleading information” subdomain within Misinformation, where 94% of risks are attributed to AI. This contrasts sharply with subdomains like “Compromise of privacy by obtaining, leaking or correctly inferring sensitive information,” where AI is identified as the causal entity in only 62% of instances, indicating less consensus regarding the source of privacy risks. In contrast, Humans are presented as the primary cause for risks related to the Malicious actors & misuse domain, suggesting a perception that these risks stem from deliberate human actions rather than the AI’s inherent behavior. While some risks are attributed overwhelmingly to human decisions (e.g., “Power centralization and unfair distribution of benefits”), others receive mixed attribution, signifying divergent perspectives on accountability. These differences highlight varying perceptions and framings of AI risks depending on what aspect of harm people are concerned about.

Regarding Intent, the Malicious actors & misuse domain overwhelmingly associates risks with intentionality, while subdomains such as “Unfair discrimination and misrepresentation” and “Unequal performance across groups” attribute risks mainly to unintentional behaviors. This divergence signifies varied understandings of how harm manifests: either as a deliberate outcome in domains like misuse or as an unintended consequence in domains like discrimination. Yet, areas such as “Power centralization and unfair distribution of benefits” and “Governance failure” see ambiguous or mixed classification of intent, pointing towards a recognition that these risks may have components of intent related to decisions about structure and governance, unintended accidents as systems and policies are put in place, or even the lack of intent overall, and the emergence of ambiguous risks can occur spontaneously, often without defined direction.

Most risks are presented as occurring Post-deployment, across most domains. However a few subdomains show ambiguity or equal measures of pre and post deployment concerns. These ambiguities related to timing across subdomains imply complex interplay of risks throughout a system’s lifecycle, not necessarily as isolated pre- or post-deployment events, but perhaps as continuing occurrences with causes traceable to multiple stages of action and development. This detailed dissection exposes a nuanced depiction of the AI risk landscape, offering a refined comprehension essential for devising targeted mitigation strategies and policies across varied AI domains and contexts.

What are the primary domains of AI risk?

The risks associated with Artificial Intelligence (AI) are complex and multifaceted, spanning several key domains. This paper synthesizes a Domain Taxonomy of AI Risks, grouping these risks into seven primary domains to provide a comprehensive overview. These domains are discrimination and toxicity, covering issues of unfair bias and harmful content; privacy and security, addressing data breaches and system vulnerabilities; misinformation, focusing on the spread of false or misleading information; malicious actors and misuse, highlighting the potential for AI in cyberattacks and manipulation; human-computer interaction, exploring overreliance and loss of agency; socioeconomic and environmental harms, examining inequality and ecological impact; and AI system safety, failures, and limitations, including issues of goal misalignment and lack of robustness. Each domain offers a specific lens through which to understand and address the various ways in which AI can pose risks to individuals, society, and the environment.

Each primary domain is further divided into subdomains to provide more granular understanding of specific risks. For example, the discrimination and toxicity domain includes subdomains for unfair discrimination and misrepresentation, exposure to toxic content, and unequal performance across groups. Similarly, the privacy and security domain is broken down into compromise of privacy by obtaining, leaking, or correctly inferring sensitive information, and AI system security vulnerabilities and attacks. This detailed categorization allows for a more targeted approach to risk assessment and mitigation strategies. The prevalence of these domains in existing literature varies, with AI system safety, failures, & limitations, socioeconomic & environmental harms, and discrimination & toxicity being the most frequently discussed, suggesting these areas are of particular concern to researchers and practitioners.

Most and least discussed areas

While many domains are well-explored, some remain relatively underexamined. AI welfare and rights, pollution of the information ecosystem and loss of consensus reality, and competitive dynamics all receive less focus in the current literature. This disparity highlights potential gaps in AI risk research, indicating areas that may require further attention and investigation. By understanding the scope and prevalence of risks within each domain, stakeholders can better prioritize their efforts and develop more effective strategies for mitigating the potential harms associated with AI development and deployment.

What implications does the AI Risk Repository have for key audiences such as policymakers?

The AI Risk Repository offers several specific benefits for policymakers navigating the complex landscape of AI regulation. Firstly, it provides a concrete foundation for operationalizing the frequently cited, yet often vaguely defined, terms “harm” and “risk” that appear in AI regulatory frameworks. By offering a detailed catalog of potential risks, the repository enables the development of clear, measurable compliance metrics. These metrics can then be used to effectively monitor adherence to established standards, ensuring that AI systems are developed and deployed responsibly. In essence, the repository brings clarity and specificity to regulatory language, facilitating more effective enforcement and risk mitigation.

Secondly, the AI Risk Repository fosters international collaboration by providing a common language and shared criteria for discussing AI risks. This is particularly important as AI technologies transcend national borders, requiring coordinated regulatory approaches. Bodies such as the EU-US Trade and Technology Council, which are working to develop shared repositories of metrics and methodologies for assessing AI trustworthiness, can leverage the AI Risk Repository to promote interoperability between regulatory frameworks. By harmonizing terminology and providing a unified classification system for AI risks, the repository facilitates the development of global standards that promote responsible AI innovation worldwide. Moreover, the AI Risk Repository offers a comprehensive, up-to-date database of AI risks that assists policymakers in effectively prioritizing resources, tracking emergent risk trends, and creating targeted training programs to address key vulnerabilities within the AI ecosystem.

What are the limitations of this study?

This study, while providing a valuable synthesis of AI risk frameworks, acknowledges certain limitations that should be considered when interpreting its findings. Firstly, the comprehensiveness of the AI Risk Repository relies heavily on the availability and quality of the documented taxonomies and classifications. The exclusion of domain-specific (e.g., healthcare) or location-specific (e.g., a particular country) frameworks limits the generalizability of the findings to a broader context. Additionally, the reliance on extraction and coding by single expert reviewers introduces the potential for subjective biases and interpretation errors, impacting the neutrality of the assembled classifications. Although efforts were made to capture risks as presented by the original authors and coders, ambiguities in the source materials may have led to unintentional misinterpretations or omissions, possibly influencing the final content of the repository.

Moreover, the AI Risk Repository is conceived as a foundation for general use, trading accuracy for clarity, simplicity, and exhaustiveness. As such, it may not be perfectly suited for all specific contexts or use cases, such as technical risk evaluations that require more nuanced analyses or granular categorizations. The binary labeling of pre- versus post-deployment risks could benefit from a more elaborate representation involving several distinct stages, covering the progression of AI systems from design to long-term application. Furthermore, the risk analysis lacks quantitative dimensions, failing to capture important aspects, such as the impact and likelihood of risks, as well as interdependencies across different risks and the crucial distinction between instrumental and terminal risks. These omissions limit its utility for prioritization or balancing risk mitigation with benefit maximization.

Going forward, future research can improve upon this work by refining the consistency, specificity, and coherence of the definitions used for AI risks, potentially integrating semantic tools, such as ontologies, to enable a more shared understanding. More attention could be given to risk areas that are relatively unexplored by the literature, such as AI agents beyond language models, pre-deployment risks caused by humans, and AI rights and welfare considerations. Future iterations might incorporate variables such as threat vectors (bio, cyber), AI classification (agentic, generative), open source or closed, organizational structures or types (big tech or startups), and types of harm, such as economic loss or danger to human life. By acknowledging and addressing these areas for improvement, future work can continue to foster the construction of a coordinated approach to defining, auditing, and managing the varied risks presented by AI systems.
Ultimately, this work illuminates the multifaceted nature of AI risk, moving beyond simplistic notions to reveal the complex interplay of causal factors like the responsible entity, the intent behind actions, and the timing of risk manifestation. The identified trends, or lack thereof, across different AI domains underscore the evolving and often inconsistent ways we perceive and frame these risks. By providing a structured yet adaptable framework, this repository empowers stakeholders to move towards a more unified understanding, fostering clearer communication, and, crucially, enabling the development of targeted strategies to proactively mitigate the potential harms of AI. This represents a crucial step towards responsibly navigating the promises and perils inherent in this rapidly advancing technology.