1.. SPDX-License-Identifier: GPL-2.0 2 3Idmappings 4========== 5 6Most filesystem developers will have encountered idmappings. They are used when 7reading from or writing ownership to disk, reporting ownership to userspace, or 8for permission checking. This document is aimed at filesystem developers that 9want to know how idmappings work. 10 11Formal notes 12------------ 13 14An idmapping is essentially a translation of a range of ids into another or the 15same range of ids. The notational convention for idmappings that is widely used 16in userspace is:: 17 18 u:k:r 19 20``u`` indicates the first element in the upper idmapset ``U`` and ``k`` 21indicates the first element in the lower idmapset ``K``. The ``r`` parameter 22indicates the range of the idmapping, i.e. how many ids are mapped. From now 23on, we will always prefix ids with ``u`` or ``k`` to make it clear whether 24we're talking about an id in the upper or lower idmapset. 25 26To see what this looks like in practice, let's take the following idmapping:: 27 28 u22:k10000:r3 29 30and write down the mappings it will generate:: 31 32 u22 -> k10000 33 u23 -> k10001 34 u24 -> k10002 35 36From a mathematical viewpoint ``U`` and ``K`` are well-ordered sets and an 37idmapping is an order isomorphism from ``U`` into ``K``. So ``U`` and ``K`` are 38order isomorphic. In fact, ``U`` and ``K`` are always well-ordered subsets of 39the set of all possible ids useable on a given system. 40 41Looking at this mathematically briefly will help us highlight some properties 42that make it easier to understand how we can translate between idmappings. For 43example, we know that the inverse idmapping is an order isomorphism as well:: 44 45 k10000 -> u22 46 k10001 -> u23 47 k10002 -> u24 48 49Given that we are dealing with order isomorphisms plus the fact that we're 50dealing with subsets we can embedd idmappings into each other, i.e. we can 51sensibly translate between different idmappings. For example, assume we've been 52given the three idmappings:: 53 54 1. u0:k10000:r10000 55 2. u0:k20000:r10000 56 3. u0:k30000:r10000 57 58and id ``k11000`` which has been generated by the first idmapping by mapping 59``u1000`` from the upper idmapset down to ``k11000`` in the lower idmapset. 60 61Because we're dealing with order isomorphic subsets it is meaningful to ask 62what id ``k11000`` corresponds to in the second or third idmapping. The 63straightfoward algorithm to use is to apply the inverse of the first idmapping, 64mapping ``k11000`` up to ``u1000``. Afterwards, we can map ``u1000`` down using 65either the second idmapping mapping or third idmapping mapping. The second 66idmapping would map ``u1000`` down to ``21000``. The third idmapping would map 67``u1000`` down to ``u31000``. 68 69If we were given the same task for the following three idmappings:: 70 71 1. u0:k10000:r10000 72 2. u0:k20000:r200 73 3. u0:k30000:r300 74 75we would fail to translate as the sets aren't order isomorphic over the full 76range of the first idmapping anymore (However they are order isomorphic over 77the full range of the second idmapping.). Neither the second or third idmapping 78contain ``u1000`` in the upper idmapset ``U``. This is equivalent to not having 79an id mapped. We can simply say that ``u1000`` is unmapped in the second and 80third idmapping. The kernel will report unmapped ids as the overflowuid 81``(uid_t)-1`` or overflowgid ``(gid_t)-1`` to userspace. 82 83The algorithm to calculate what a given id maps to is pretty simple. First, we 84need to verify that the range can contain our target id. We will skip this step 85for simplicity. After that if we want to know what ``id`` maps to we can do 86simple calculations: 87 88- If we want to map from left to right:: 89 90 u:k:r 91 id - u + k = n 92 93- If we want to map from right to left:: 94 95 u:k:r 96 id - k + u = n 97 98Instead of "left to right" we can also say "down" and instead of "right to 99left" we can also say "up". Obviously mapping down and up invert each other. 100 101To see whether the simple formulas above work, consider the following two 102idmappings:: 103 104 1. u0:k20000:r10000 105 2. u500:k30000:r10000 106 107Assume we are given ``k21000`` in the lower idmapset of the first idmapping. We 108want to know what id this was mapped from in the upper idmapset of the first 109idmapping. So we're mapping up in the first idmapping:: 110 111 id - k + u = n 112 k21000 - k20000 + u0 = u1000 113 114Now assume we are given the id ``u1100`` in the upper idmapset of the second 115idmapping and we want to know what this id maps down to in the lower idmapset 116of the second idmapping. This means we're mapping down in the second 117idmapping:: 118 119 id - u + k = n 120 u1100 - u500 + k30000 = k30600 121 122General notes 123------------- 124 125In the context of the kernel an idmapping can be interpreted as mapping a range 126of userspace ids into a range of kernel ids:: 127 128 userspace-id:kernel-id:range 129 130A userspace id is always an element in the upper idmapset of an idmapping of 131type ``uid_t`` or ``gid_t`` and a kernel id is always an element in the lower 132idmapset of an idmapping of type ``kuid_t`` or ``kgid_t``. From now on 133"userspace id" will be used to refer to the well known ``uid_t`` and ``gid_t`` 134types and "kernel id" will be used to refer to ``kuid_t`` and ``kgid_t``. 135 136The kernel is mostly concerned with kernel ids. They are used when performing 137permission checks and are stored in an inode's ``i_uid`` and ``i_gid`` field. 138A userspace id on the other hand is an id that is reported to userspace by the 139kernel, or is passed by userspace to the kernel, or a raw device id that is 140written or read from disk. 141 142Note that we are only concerned with idmappings as the kernel stores them not 143how userspace would specify them. 144 145For the rest of this document we will prefix all userspace ids with ``u`` and 146all kernel ids with ``k``. Ranges of idmappings will be prefixed with ``r``. So 147an idmapping will be written as ``u0:k10000:r10000``. 148 149For example, the id ``u1000`` is an id in the upper idmapset or "userspace 150idmapset" starting with ``u1000``. And it is mapped to ``k11000`` which is a 151kernel id in the lower idmapset or "kernel idmapset" starting with ``k10000``. 152 153A kernel id is always created by an idmapping. Such idmappings are associated 154with user namespaces. Since we mainly care about how idmappings work we're not 155going to be concerned with how idmappings are created nor how they are used 156outside of the filesystem context. This is best left to an explanation of user 157namespaces. 158 159The initial user namespace is special. It always has an idmapping of the 160following form:: 161 162 u0:k0:r4294967295 163 164which is an identity idmapping over the full range of ids available on this 165system. 166 167Other user namespaces usually have non-identity idmappings such as:: 168 169 u0:k10000:r10000 170 171When a process creates or wants to change ownership of a file, or when the 172ownership of a file is read from disk by a filesystem, the userspace id is 173immediately translated into a kernel id according to the idmapping associated 174with the relevant user namespace. 175 176For instance, consider a file that is stored on disk by a filesystem as being 177owned by ``u1000``: 178 179- If a filesystem were to be mounted in the initial user namespaces (as most 180 filesystems are) then the initial idmapping will be used. As we saw this is 181 simply the identity idmapping. This would mean id ``u1000`` read from disk 182 would be mapped to id ``k1000``. So an inode's ``i_uid`` and ``i_gid`` field 183 would contain ``k1000``. 184 185- If a filesystem were to be mounted with an idmapping of ``u0:k10000:r10000`` 186 then ``u1000`` read from disk would be mapped to ``k11000``. So an inode's 187 ``i_uid`` and ``i_gid`` would contain ``k11000``. 188 189Translation algorithms 190---------------------- 191 192We've already seen briefly that it is possible to translate between different 193idmappings. We'll now take a closer look how that works. 194 195Crossmapping 196~~~~~~~~~~~~ 197 198This translation algorithm is used by the kernel in quite a few places. For 199example, it is used when reporting back the ownership of a file to userspace 200via the ``stat()`` system call family. 201 202If we've been given ``k11000`` from one idmapping we can map that id up in 203another idmapping. In order for this to work both idmappings need to contain 204the same kernel id in their kernel idmapsets. For example, consider the 205following idmappings:: 206 207 1. u0:k10000:r10000 208 2. u20000:k10000:r10000 209 210and we are mapping ``u1000`` down to ``k11000`` in the first idmapping . We can 211then translate ``k11000`` into a userspace id in the second idmapping using the 212kernel idmapset of the second idmapping:: 213 214 /* Map the kernel id up into a userspace id in the second idmapping. */ 215 from_kuid(u20000:k10000:r10000, k11000) = u21000 216 217Note, how we can get back to the kernel id in the first idmapping by inverting 218the algorithm:: 219 220 /* Map the userspace id down into a kernel id in the second idmapping. */ 221 make_kuid(u20000:k10000:r10000, u21000) = k11000 222 223 /* Map the kernel id up into a userspace id in the first idmapping. */ 224 from_kuid(u0:k10000:r10000, k11000) = u1000 225 226This algorithm allows us to answer the question what userspace id a given 227kernel id corresponds to in a given idmapping. In order to be able to answer 228this question both idmappings need to contain the same kernel id in their 229respective kernel idmapsets. 230 231For example, when the kernel reads a raw userspace id from disk it maps it down 232into a kernel id according to the idmapping associated with the filesystem. 233Let's assume the filesystem was mounted with an idmapping of 234``u0:k20000:r10000`` and it reads a file owned by ``u1000`` from disk. This 235means ``u1000`` will be mapped to ``k21000`` which is what will be stored in 236the inode's ``i_uid`` and ``i_gid`` field. 237 238When someone in userspace calls ``stat()`` or a related function to get 239ownership information about the file the kernel can't simply map the id back up 240according to the filesystem's idmapping as this would give the wrong owner if 241the caller is using an idmapping. 242 243So the kernel will map the id back up in the idmapping of the caller. Let's 244assume the caller has the slighly unconventional idmapping 245``u3000:k20000:r10000`` then ``k21000`` would map back up to ``u4000``. 246Consequently the user would see that this file is owned by ``u4000``. 247 248Remapping 249~~~~~~~~~ 250 251It is possible to translate a kernel id from one idmapping to another one via 252the userspace idmapset of the two idmappings. This is equivalent to remapping 253a kernel id. 254 255Let's look at an example. We are given the following two idmappings:: 256 257 1. u0:k10000:r10000 258 2. u0:k20000:r10000 259 260and we are given ``k11000`` in the first idmapping. In order to translate this 261kernel id in the first idmapping into a kernel id in the second idmapping we 262need to perform two steps: 263 2641. Map the kernel id up into a userspace id in the first idmapping:: 265 266 /* Map the kernel id up into a userspace id in the first idmapping. */ 267 from_kuid(u0:k10000:r10000, k11000) = u1000 268 2692. Map the userspace id down into a kernel id in the second idmapping:: 270 271 /* Map the userspace id down into a kernel id in the second idmapping. */ 272 make_kuid(u0:k20000:r10000, u1000) = k21000 273 274As you can see we used the userspace idmapset in both idmappings to translate 275the kernel id in one idmapping to a kernel id in another idmapping. 276 277This allows us to answer the question what kernel id we would need to use to 278get the same userspace id in another idmapping. In order to be able to answer 279this question both idmappings need to contain the same userspace id in their 280respective userspace idmapsets. 281 282Note, how we can easily get back to the kernel id in the first idmapping by 283inverting the algorithm: 284 2851. Map the kernel id up into a userspace id in the second idmapping:: 286 287 /* Map the kernel id up into a userspace id in the second idmapping. */ 288 from_kuid(u0:k20000:r10000, k21000) = u1000 289 2902. Map the userspace id down into a kernel id in the first idmapping:: 291 292 /* Map the userspace id down into a kernel id in the first idmapping. */ 293 make_kuid(u0:k10000:r10000, u1000) = k11000 294 295Another way to look at this translation is to treat it as inverting one 296idmapping and applying another idmapping if both idmappings have the relevant 297userspace id mapped. This will come in handy when working with idmapped mounts. 298 299Invalid translations 300~~~~~~~~~~~~~~~~~~~~ 301 302It is never valid to use an id in the kernel idmapset of one idmapping as the 303id in the userspace idmapset of another or the same idmapping. While the kernel 304idmapset always indicates an idmapset in the kernel id space the userspace 305idmapset indicates a userspace id. So the following translations are forbidden:: 306 307 /* Map the userspace id down into a kernel id in the first idmapping. */ 308 make_kuid(u0:k10000:r10000, u1000) = k11000 309 310 /* INVALID: Map the kernel id down into a kernel id in the second idmapping. */ 311 make_kuid(u10000:k20000:r10000, k110000) = k21000 312 ~~~~~~~ 313 314and equally wrong:: 315 316 /* Map the kernel id up into a userspace id in the first idmapping. */ 317 from_kuid(u0:k10000:r10000, k11000) = u1000 318 319 /* INVALID: Map the userspace id up into a userspace id in the second idmapping. */ 320 from_kuid(u20000:k0:r10000, u1000) = k21000 321 ~~~~~ 322 323Idmappings when creating filesystem objects 324------------------------------------------- 325 326The concepts of mapping an id down or mapping an id up are expressed in the two 327kernel functions filesystem developers are rather familiar with and which we've 328already used in this document:: 329 330 /* Map the userspace id down into a kernel id. */ 331 make_kuid(idmapping, uid) 332 333 /* Map the kernel id up into a userspace id. */ 334 from_kuid(idmapping, kuid) 335 336We will take an abbreviated look into how idmappings figure into creating 337filesystem objects. For simplicity we will only look at what happens when the 338VFS has already completed path lookup right before it calls into the filesystem 339itself. So we're concerned with what happens when e.g. ``vfs_mkdir()`` is 340called. We will also assume that the directory we're creating filesystem 341objects in is readable and writable for everyone. 342 343When creating a filesystem object the caller will look at the caller's 344filesystem ids. These are just regular ``uid_t`` and ``gid_t`` userspace ids 345but they are exclusively used when determining file ownership which is why they 346are called "filesystem ids". They are usually identical to the uid and gid of 347the caller but can differ. We will just assume they are always identical to not 348get lost in too many details. 349 350When the caller enters the kernel two things happen: 351 3521. Map the caller's userspace ids down into kernel ids in the caller's 353 idmapping. 354 (To be precise, the kernel will simply look at the kernel ids stashed in the 355 credentials of the current task but for our education we'll pretend this 356 translation happens just in time.) 3572. Verify that the caller's kernel ids can be mapped up to userspace ids in the 358 filesystem's idmapping. 359 360The second step is important as regular filesystem will ultimately need to map 361the kernel id back up into a userspace id when writing to disk. 362So with the second step the kernel guarantees that a valid userspace id can be 363written to disk. If it can't the kernel will refuse the creation request to not 364even remotely risk filesystem corruption. 365 366The astute reader will have realized that this is simply a varation of the 367crossmapping algorithm we mentioned above in a previous section. First, the 368kernel maps the caller's userspace id down into a kernel id according to the 369caller's idmapping and then maps that kernel id up according to the 370filesystem's idmapping. 371 372Let's see some examples with caller/filesystem idmapping but without mount 373idmappings. This will exhibit some problems we can hit. After that we will 374revisit/reconsider these examples, this time using mount idmappings, to see how 375they can solve the problems we observed before. 376 377Example 1 378~~~~~~~~~ 379 380:: 381 382 caller id: u1000 383 caller idmapping: u0:k0:r4294967295 384 filesystem idmapping: u0:k0:r4294967295 385 386Both the caller and the filesystem use the identity idmapping: 387 3881. Map the caller's userspace ids into kernel ids in the caller's idmapping:: 389 390 make_kuid(u0:k0:r4294967295, u1000) = k1000 391 3922. Verify that the caller's kernel ids can be mapped to userspace ids in the 393 filesystem's idmapping. 394 395 For this second step the kernel will call the function 396 ``fsuidgid_has_mapping()`` which ultimately boils down to calling 397 ``from_kuid()``:: 398 399 from_kuid(u0:k0:r4294967295, k1000) = u1000 400 401In this example both idmappings are the same so there's nothing exciting going 402on. Ultimately the userspace id that lands on disk will be ``u1000``. 403 404Example 2 405~~~~~~~~~ 406 407:: 408 409 caller id: u1000 410 caller idmapping: u0:k10000:r10000 411 filesystem idmapping: u0:k20000:r10000 412 4131. Map the caller's userspace ids down into kernel ids in the caller's 414 idmapping:: 415 416 make_kuid(u0:k10000:r10000, u1000) = k11000 417 4182. Verify that the caller's kernel ids can be mapped up to userspace ids in the 419 filesystem's idmapping:: 420 421 from_kuid(u0:k20000:r10000, k11000) = u-1 422 423It's immediately clear that while the caller's userspace id could be 424successfully mapped down into kernel ids in the caller's idmapping the kernel 425ids could not be mapped up according to the filesystem's idmapping. So the 426kernel will deny this creation request. 427 428Note that while this example is less common, because most filesystem can't be 429mounted with non-initial idmappings this is a general problem as we can see in 430the next examples. 431 432Example 3 433~~~~~~~~~ 434 435:: 436 437 caller id: u1000 438 caller idmapping: u0:k10000:r10000 439 filesystem idmapping: u0:k0:r4294967295 440 4411. Map the caller's userspace ids down into kernel ids in the caller's 442 idmapping:: 443 444 make_kuid(u0:k10000:r10000, u1000) = k11000 445 4462. Verify that the caller's kernel ids can be mapped up to userspace ids in the 447 filesystem's idmapping:: 448 449 from_kuid(u0:k0:r4294967295, k11000) = u11000 450 451We can see that the translation always succeeds. The userspace id that the 452filesystem will ultimately put to disk will always be identical to the value of 453the kernel id that was created in the caller's idmapping. This has mainly two 454consequences. 455 456First, that we can't allow a caller to ultimately write to disk with another 457userspace id. We could only do this if we were to mount the whole fileystem 458with the caller's or another idmapping. But that solution is limited to a few 459filesystems and not very flexible. But this is a use-case that is pretty 460important in containerized workloads. 461 462Second, the caller will usually not be able to create any files or access 463directories that have stricter permissions because none of the filesystem's 464kernel ids map up into valid userspace ids in the caller's idmapping 465 4661. Map raw userspace ids down to kernel ids in the filesystem's idmapping:: 467 468 make_kuid(u0:k0:r4294967295, u1000) = k1000 469 4702. Map kernel ids up to userspace ids in the caller's idmapping:: 471 472 from_kuid(u0:k10000:r10000, k1000) = u-1 473 474Example 4 475~~~~~~~~~ 476 477:: 478 479 file id: u1000 480 caller idmapping: u0:k10000:r10000 481 filesystem idmapping: u0:k0:r4294967295 482 483In order to report ownership to userspace the kernel uses the crossmapping 484algorithm introduced in a previous section: 485 4861. Map the userspace id on disk down into a kernel id in the filesystem's 487 idmapping:: 488 489 make_kuid(u0:k0:r4294967295, u1000) = k1000 490 4912. Map the kernel id up into a userspace id in the caller's idmapping:: 492 493 from_kuid(u0:k10000:r10000, k1000) = u-1 494 495The crossmapping algorithm fails in this case because the kernel id in the 496filesystem idmapping cannot be mapped up to a userspace id in the caller's 497idmapping. Thus, the kernel will report the ownership of this file as the 498overflowid. 499 500Example 5 501~~~~~~~~~ 502 503:: 504 505 file id: u1000 506 caller idmapping: u0:k10000:r10000 507 filesystem idmapping: u0:k20000:r10000 508 509In order to report ownership to userspace the kernel uses the crossmapping 510algorithm introduced in a previous section: 511 5121. Map the userspace id on disk down into a kernel id in the filesystem's 513 idmapping:: 514 515 make_kuid(u0:k20000:r10000, u1000) = k21000 516 5172. Map the kernel id up into a userspace id in the caller's idmapping:: 518 519 from_kuid(u0:k10000:r10000, k21000) = u-1 520 521Again, the crossmapping algorithm fails in this case because the kernel id in 522the filesystem idmapping cannot be mapped to a userspace id in the caller's 523idmapping. Thus, the kernel will report the ownership of this file as the 524overflowid. 525 526Note how in the last two examples things would be simple if the caller would be 527using the initial idmapping. For a filesystem mounted with the initial 528idmapping it would be trivial. So we only consider a filesystem with an 529idmapping of ``u0:k20000:r10000``: 530 5311. Map the userspace id on disk down into a kernel id in the filesystem's 532 idmapping:: 533 534 make_kuid(u0:k20000:r10000, u1000) = k21000 535 5362. Map the kernel id up into a userspace id in the caller's idmapping:: 537 538 from_kuid(u0:k0:r4294967295, k21000) = u21000 539 540Idmappings on idmapped mounts 541----------------------------- 542 543The examples we've seen in the previous section where the caller's idmapping 544and the filesystem's idmapping are incompatible causes various issues for 545workloads. For a more complex but common example, consider two containers 546started on the host. To completely prevent the two containers from affecting 547each other, an administrator may often use different non-overlapping idmappings 548for the two containers:: 549 550 container1 idmapping: u0:k10000:r10000 551 container2 idmapping: u0:k20000:r10000 552 filesystem idmapping: u0:k30000:r10000 553 554An administrator wanting to provide easy read-write access to the following set 555of files:: 556 557 dir id: u0 558 dir/file1 id: u1000 559 dir/file2 id: u2000 560 561to both containers currently can't. 562 563Of course the administrator has the option to recursively change ownership via 564``chown()``. For example, they could change ownership so that ``dir`` and all 565files below it can be crossmapped from the filesystem's into the container's 566idmapping. Let's assume they change ownership so it is compatible with the 567first container's idmapping:: 568 569 dir id: u10000 570 dir/file1 id: u11000 571 dir/file2 id: u12000 572 573This would still leave ``dir`` rather useless to the second container. In fact, 574``dir`` and all files below it would continue to appear owned by the overflowid 575for the second container. 576 577Or consider another increasingly popular example. Some service managers such as 578systemd implement a concept called "portable home directories". A user may want 579to use their home directories on different machines where they are assigned 580different login userspace ids. Most users will have ``u1000`` as the login id 581on their machine at home and all files in their home directory will usually be 582owned by ``u1000``. At uni or at work they may have another login id such as 583``u1125``. This makes it rather difficult to interact with their home directory 584on their work machine. 585 586In both cases changing ownership recursively has grave implications. The most 587obvious one is that ownership is changed globally and permanently. In the home 588directory case this change in ownership would even need to happen everytime the 589user switches from their home to their work machine. For really large sets of 590files this becomes increasingly costly. 591 592If the user is lucky, they are dealing with a filesystem that is mountable 593inside user namespaces. But this would also change ownership globally and the 594change in ownership is tied to the lifetime of the filesystem mount, i.e. the 595superblock. The only way to change ownership is to completely unmount the 596filesystem and mount it again in another user namespace. This is usually 597impossible because it would mean that all users currently accessing the 598filesystem can't anymore. And it means that ``dir`` still can't be shared 599between two containers with different idmappings. 600But usually the user doesn't even have this option since most filesystems 601aren't mountable inside containers. And not having them mountable might be 602desirable as it doesn't require the filesystem to deal with malicious 603filesystem images. 604 605But the usecases mentioned above and more can be handled by idmapped mounts. 606They allow to expose the same set of dentries with different ownership at 607different mounts. This is achieved by marking the mounts with a user namespace 608through the ``mount_setattr()`` system call. The idmapping associated with it 609is then used to translate from the caller's idmapping to the filesystem's 610idmapping and vica versa using the remapping algorithm we introduced above. 611 612Idmapped mounts make it possible to change ownership in a temporary and 613localized way. The ownership changes are restricted to a specific mount and the 614ownership changes are tied to the lifetime of the mount. All other users and 615locations where the filesystem is exposed are unaffected. 616 617Filesystems that support idmapped mounts don't have any real reason to support 618being mountable inside user namespaces. A filesystem could be exposed 619completely under an idmapped mount to get the same effect. This has the 620advantage that filesystems can leave the creation of the superblock to 621privileged users in the initial user namespace. 622 623However, it is perfectly possible to combine idmapped mounts with filesystems 624mountable inside user namespaces. We will touch on this further below. 625 626Remapping helpers 627~~~~~~~~~~~~~~~~~ 628 629Idmapping functions were added that translate between idmappings. They make use 630of the remapping algorithm we've introduced earlier. We're going to look at 631two: 632 633- ``i_uid_into_mnt()`` and ``i_gid_into_mnt()`` 634 635 The ``i_*id_into_mnt()`` functions translate filesystem's kernel ids into 636 kernel ids in the mount's idmapping:: 637 638 /* Map the filesystem's kernel id up into a userspace id in the filesystem's idmapping. */ 639 from_kuid(filesystem, kid) = uid 640 641 /* Map the filesystem's userspace id down ito a kernel id in the mount's idmapping. */ 642 make_kuid(mount, uid) = kuid 643 644- ``mapped_fsuid()`` and ``mapped_fsgid()`` 645 646 The ``mapped_fs*id()`` functions translate the caller's kernel ids into 647 kernel ids in the filesystem's idmapping. This translation is achieved by 648 remapping the caller's kernel ids using the mount's idmapping:: 649 650 /* Map the caller's kernel id up into a userspace id in the mount's idmapping. */ 651 from_kuid(mount, kid) = uid 652 653 /* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */ 654 make_kuid(filesystem, uid) = kuid 655 656Note that these two functions invert each other. Consider the following 657idmappings:: 658 659 caller idmapping: u0:k10000:r10000 660 filesystem idmapping: u0:k20000:r10000 661 mount idmapping: u0:k10000:r10000 662 663Assume a file owned by ``u1000`` is read from disk. The filesystem maps this id 664to ``k21000`` according to its idmapping. This is what is stored in the 665inode's ``i_uid`` and ``i_gid`` fields. 666 667When the caller queries the ownership of this file via ``stat()`` the kernel 668would usually simply use the crossmapping algorithm and map the filesystem's 669kernel id up to a userspace id in the caller's idmapping. 670 671But when the caller is accessing the file on an idmapped mount the kernel will 672first call ``i_uid_into_mnt()`` thereby translating the filesystem's kernel id 673into a kernel id in the mount's idmapping:: 674 675 i_uid_into_mnt(k21000): 676 /* Map the filesystem's kernel id up into a userspace id. */ 677 from_kuid(u0:k20000:r10000, k21000) = u1000 678 679 /* Map the filesystem's userspace id down ito a kernel id in the mount's idmapping. */ 680 make_kuid(u0:k10000:r10000, u1000) = k11000 681 682Finally, when the kernel reports the owner to the caller it will turn the 683kernel id in the mount's idmapping into a userspace id in the caller's 684idmapping:: 685 686 from_kuid(u0:k10000:r10000, k11000) = u1000 687 688We can test whether this algorithm really works by verifying what happens when 689we create a new file. Let's say the user is creating a file with ``u1000``. 690 691The kernel maps this to ``k11000`` in the caller's idmapping. Usually the 692kernel would now apply the crossmapping, verifying that ``k11000`` can be 693mapped to a userspace id in the filesystem's idmapping. Since ``k11000`` can't 694be mapped up in the filesystem's idmapping directly this creation request 695fails. 696 697But when the caller is accessing the file on an idmapped mount the kernel will 698first call ``mapped_fs*id()`` thereby translating the caller's kernel id into 699a kernel id according to the mount's idmapping:: 700 701 mapped_fsuid(k11000): 702 /* Map the caller's kernel id up into a userspace id in the mount's idmapping. */ 703 from_kuid(u0:k10000:r10000, k11000) = u1000 704 705 /* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */ 706 make_kuid(u0:k20000:r10000, u1000) = k21000 707 708When finally writing to disk the kernel will then map ``k21000`` up into a 709userspace id in the filesystem's idmapping:: 710 711 from_kuid(u0:k20000:r10000, k21000) = u1000 712 713As we can see, we end up with an invertible and therefore information 714preserving algorithm. A file created from ``u1000`` on an idmapped mount will 715also be reported as being owned by ``u1000`` and vica versa. 716 717Let's now briefly reconsider the failing examples from earlier in the context 718of idmapped mounts. 719 720Example 2 reconsidered 721~~~~~~~~~~~~~~~~~~~~~~ 722 723:: 724 725 caller id: u1000 726 caller idmapping: u0:k10000:r10000 727 filesystem idmapping: u0:k20000:r10000 728 mount idmapping: u0:k10000:r10000 729 730When the caller is using a non-initial idmapping the common case is to attach 731the same idmapping to the mount. We now perform three steps: 732 7331. Map the caller's userspace ids into kernel ids in the caller's idmapping:: 734 735 make_kuid(u0:k10000:r10000, u1000) = k11000 736 7372. Translate the caller's kernel id into a kernel id in the filesystem's 738 idmapping:: 739 740 mapped_fsuid(k11000): 741 /* Map the kernel id up into a userspace id in the mount's idmapping. */ 742 from_kuid(u0:k10000:r10000, k11000) = u1000 743 744 /* Map the userspace id down into a kernel id in the filesystem's idmapping. */ 745 make_kuid(u0:k20000:r10000, u1000) = k21000 746 7472. Verify that the caller's kernel ids can be mapped to userspace ids in the 748 filesystem's idmapping:: 749 750 from_kuid(u0:k20000:r10000, k21000) = u1000 751 752So the ownership that lands on disk will be ``u1000``. 753 754Example 3 reconsidered 755~~~~~~~~~~~~~~~~~~~~~~ 756 757:: 758 759 caller id: u1000 760 caller idmapping: u0:k10000:r10000 761 filesystem idmapping: u0:k0:r4294967295 762 mount idmapping: u0:k10000:r10000 763 764The same translation algorithm works with the third example. 765 7661. Map the caller's userspace ids into kernel ids in the caller's idmapping:: 767 768 make_kuid(u0:k10000:r10000, u1000) = k11000 769 7702. Translate the caller's kernel id into a kernel id in the filesystem's 771 idmapping:: 772 773 mapped_fsuid(k11000): 774 /* Map the kernel id up into a userspace id in the mount's idmapping. */ 775 from_kuid(u0:k10000:r10000, k11000) = u1000 776 777 /* Map the userspace id down into a kernel id in the filesystem's idmapping. */ 778 make_kuid(u0:k0:r4294967295, u1000) = k1000 779 7802. Verify that the caller's kernel ids can be mapped to userspace ids in the 781 filesystem's idmapping:: 782 783 from_kuid(u0:k0:r4294967295, k21000) = u1000 784 785So the ownership that lands on disk will be ``u1000``. 786 787Example 4 reconsidered 788~~~~~~~~~~~~~~~~~~~~~~ 789 790:: 791 792 file id: u1000 793 caller idmapping: u0:k10000:r10000 794 filesystem idmapping: u0:k0:r4294967295 795 mount idmapping: u0:k10000:r10000 796 797In order to report ownership to userspace the kernel now does three steps using 798the translation algorithm we introduced earlier: 799 8001. Map the userspace id on disk down into a kernel id in the filesystem's 801 idmapping:: 802 803 make_kuid(u0:k0:r4294967295, u1000) = k1000 804 8052. Translate the kernel id into a kernel id in the mount's idmapping:: 806 807 i_uid_into_mnt(k1000): 808 /* Map the kernel id up into a userspace id in the filesystem's idmapping. */ 809 from_kuid(u0:k0:r4294967295, k1000) = u1000 810 811 /* Map the userspace id down into a kernel id in the mounts's idmapping. */ 812 make_kuid(u0:k10000:r10000, u1000) = k11000 813 8143. Map the kernel id up into a userspace id in the caller's idmapping:: 815 816 from_kuid(u0:k10000:r10000, k11000) = u1000 817 818Earlier, the caller's kernel id couldn't be crossmapped in the filesystems's 819idmapping. With the idmapped mount in place it now can be crossmapped into the 820filesystem's idmapping via the mount's idmapping. The file will now be created 821with ``u1000`` according to the mount's idmapping. 822 823Example 5 reconsidered 824~~~~~~~~~~~~~~~~~~~~~~ 825 826:: 827 828 file id: u1000 829 caller idmapping: u0:k10000:r10000 830 filesystem idmapping: u0:k20000:r10000 831 mount idmapping: u0:k10000:r10000 832 833Again, in order to report ownership to userspace the kernel now does three 834steps using the translation algorithm we introduced earlier: 835 8361. Map the userspace id on disk down into a kernel id in the filesystem's 837 idmapping:: 838 839 make_kuid(u0:k20000:r10000, u1000) = k21000 840 8412. Translate the kernel id into a kernel id in the mount's idmapping:: 842 843 i_uid_into_mnt(k21000): 844 /* Map the kernel id up into a userspace id in the filesystem's idmapping. */ 845 from_kuid(u0:k20000:r10000, k21000) = u1000 846 847 /* Map the userspace id down into a kernel id in the mounts's idmapping. */ 848 make_kuid(u0:k10000:r10000, u1000) = k11000 849 8503. Map the kernel id up into a userspace id in the caller's idmapping:: 851 852 from_kuid(u0:k10000:r10000, k11000) = u1000 853 854Earlier, the file's kernel id couldn't be crossmapped in the filesystems's 855idmapping. With the idmapped mount in place it now can be crossmapped into the 856filesystem's idmapping via the mount's idmapping. The file is now owned by 857``u1000`` according to the mount's idmapping. 858 859Changing ownership on a home directory 860~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 861 862We've seen above how idmapped mounts can be used to translate between 863idmappings when either the caller, the filesystem or both uses a non-initial 864idmapping. A wide range of usecases exist when the caller is using 865a non-initial idmapping. This mostly happens in the context of containerized 866workloads. The consequence is as we have seen that for both, filesystem's 867mounted with the initial idmapping and filesystems mounted with non-initial 868idmappings, access to the filesystem isn't working because the kernel ids can't 869be crossmapped between the caller's and the filesystem's idmapping. 870 871As we've seen above idmapped mounts provide a solution to this by remapping the 872caller's or filesystem's idmapping according to the mount's idmapping. 873 874Aside from containerized workloads, idmapped mounts have the advantage that 875they also work when both the caller and the filesystem use the initial 876idmapping which means users on the host can change the ownership of directories 877and files on a per-mount basis. 878 879Consider our previous example where a user has their home directory on portable 880storage. At home they have id ``u1000`` and all files in their home directory 881are owned by ``u1000`` whereas at uni or work they have login id ``u1125``. 882 883Taking their home directory with them becomes problematic. They can't easily 884access their files, they might not be able to write to disk without applying 885lax permissions or ACLs and even if they can, they will end up with an annoying 886mix of files and directories owned by ``u1000`` and ``u1125``. 887 888Idmapped mounts allow to solve this problem. A user can create an idmapped 889mount for their home directory on their work computer or their computer at home 890depending on what ownership they would prefer to end up on the portable storage 891itself. 892 893Let's assume they want all files on disk to belong to ``u1000``. When the user 894plugs in their portable storage at their work station they can setup a job that 895creates an idmapped mount with the minimal idmapping ``u1000:k1125:r1``. So now 896when they create a file the kernel performs the following steps we already know 897from above::: 898 899 caller id: u1125 900 caller idmapping: u0:k0:r4294967295 901 filesystem idmapping: u0:k0:r4294967295 902 mount idmapping: u1000:k1125:r1 903 9041. Map the caller's userspace ids into kernel ids in the caller's idmapping:: 905 906 make_kuid(u0:k0:r4294967295, u1125) = k1125 907 9082. Translate the caller's kernel id into a kernel id in the filesystem's 909 idmapping:: 910 911 mapped_fsuid(k1125): 912 /* Map the kernel id up into a userspace id in the mount's idmapping. */ 913 from_kuid(u1000:k1125:r1, k1125) = u1000 914 915 /* Map the userspace id down into a kernel id in the filesystem's idmapping. */ 916 make_kuid(u0:k0:r4294967295, u1000) = k1000 917 9182. Verify that the caller's kernel ids can be mapped to userspace ids in the 919 filesystem's idmapping:: 920 921 from_kuid(u0:k0:r4294967295, k1000) = u1000 922 923So ultimately the file will be created with ``u1000`` on disk. 924 925Now let's briefly look at what ownership the caller with id ``u1125`` will see 926on their work computer: 927 928:: 929 930 file id: u1000 931 caller idmapping: u0:k0:r4294967295 932 filesystem idmapping: u0:k0:r4294967295 933 mount idmapping: u1000:k1125:r1 934 9351. Map the userspace id on disk down into a kernel id in the filesystem's 936 idmapping:: 937 938 make_kuid(u0:k0:r4294967295, u1000) = k1000 939 9402. Translate the kernel id into a kernel id in the mount's idmapping:: 941 942 i_uid_into_mnt(k1000): 943 /* Map the kernel id up into a userspace id in the filesystem's idmapping. */ 944 from_kuid(u0:k0:r4294967295, k1000) = u1000 945 946 /* Map the userspace id down into a kernel id in the mounts's idmapping. */ 947 make_kuid(u1000:k1125:r1, u1000) = k1125 948 9493. Map the kernel id up into a userspace id in the caller's idmapping:: 950 951 from_kuid(u0:k0:r4294967295, k1125) = u1125 952 953So ultimately the caller will be reported that the file belongs to ``u1125`` 954which is the caller's userspace id on their workstation in our example. 955 956The raw userspace id that is put on disk is ``u1000`` so when the user takes 957their home directory back to their home computer where they are assigned 958``u1000`` using the initial idmapping and mount the filesystem with the initial 959idmapping they will see all those files owned by ``u1000``. 960