Documentation/devicetree/usage-model.txt

   1 Linux and the Device Tree
   2 -------------------------
   3 The Linux usage model for device tree data
   4
   5 Author: Grant Likely <grant.likely@secretlab.ca>
   6
   7 This article describes how Linux uses the device tree.  An overview of
   8 the device tree data format can be found on the device tree usage page
   9 at devicetree.org[1].
  10
  11 [1] http://devicetree.org/Device_Tree_Usage
  12
  13 The "Open Firmware Device Tree", or simply Device Tree (DT), is a data
  14 structure and language for describing hardware.  More specifically, it
  15 is a description of hardware that is readable by an operating system
  16 so that the operating system doesn't need to hard code details of the
  17 machine.
  18
  19 Structurally, the DT is a tree, or acyclic graph with named nodes, and
  20 nodes may have an arbitrary number of named properties encapsulating
  21 arbitrary data.  A mechanism also exists to create arbitrary
  22 links from one node to another outside of the natural tree structure.
  23
  24 Conceptually, a common set of usage conventions, called 'bindings',
  25 is defined for how data should appear in the tree to describe typical
  26 hardware characteristics including data busses, interrupt lines, GPIO
  27 connections, and peripheral devices.
  28
  29 As much as possible, hardware is described using existing bindings to
  30 maximize use of existing support code, but since property and node
  31 names are simply text strings, it is easy to extend existing bindings
  32 or create new ones by defining new nodes and properties.  Be wary,
  33 however, of creating a new binding without first doing some homework
  34 about what already exists.  There are currently two different,
  35 incompatible, bindings for i2c busses that came about because the new
  36 binding was created without first investigating how i2c devices were
  37 already being enumerated in existing systems.
  38
  39 1. History
  40 ----------
  41 The DT was originally created by Open Firmware as part of the
  42 communication method for passing data from Open Firmware to a client
  43 program (like to an operating system).  An operating system used the
  44 Device Tree to discover the topology of the hardware at runtime, and
  45 thereby support a majority of available hardware without hard coded
  46 information (assuming drivers were available for all devices).
  47
  48 Since Open Firmware is commonly used on PowerPC and SPARC platforms,
  49 the Linux support for those architectures has for a long time used the
  50 Device Tree.
  51
  52 In 2005, when PowerPC Linux began a major cleanup and to merge 32-bit
  53 and 64-bit support, the decision was made to require DT support on all
  54 powerpc platforms, regardless of whether or not they used Open
  55 Firmware.  To do this, a DT representation called the Flattened Device
  56 Tree (FDT) was created which could be passed to the kernel as a binary
  57 blob without requiring a real Open Firmware implementation.  U-Boot,
  58 kexec, and other bootloaders were modified to support both passing a
  59 Device Tree Binary (dtb) and to modify a dtb at boot time.  DT was
  60 also added to the PowerPC boot wrapper (arch/powerpc/boot/*) so that
  61 a dtb could be wrapped up with the kernel image to support booting
  62 existing non-DT aware firmware.
  63
  64 Some time later, FDT infrastructure was generalized to be usable by
  65 all architectures.  At the time of this writing, 6 mainlined
  66 architectures (arm, microblaze, mips, powerpc, sparc, and x86) and 1
  67 out of mainline (nios) have some level of DT support.
  68
  69 2. Data Model
  70 -------------
  71 If you haven't already read the Device Tree Usage[1] page,
  72 then go read it now.  It's okay, I'll wait....
  73
  74 2.1 High Level View
  75 -------------------
  76 The most important thing to understand is that the DT is simply a data
  77 structure that describes the hardware.  There is nothing magical about
  78 it, and it doesn't magically make all hardware configuration problems
  79 go away.  What it does do is provide a language for decoupling the
  80 hardware configuration from the board and device driver support in the
  81 Linux kernel (or any other operating system for that matter).  Using
  82 it allows board and device support to become data driven; to make
  83 setup decisions based on data passed into the kernel instead of on
  84 per-machine hard coded selections.
  85
  86 Ideally, data driven platform setup should result in less code
  87 duplication and make it easier to support a wide range of hardware
  88 with a single kernel image.
  89
  90 Linux uses DT data for three major purposes:
  91 1) platform identification,
  92 2) runtime configuration, and
  93 3) device population.
  94
  95 2.2 Platform Identification
  96 ---------------------------
  97 First and foremost, the kernel will use data in the DT to identify the
  98 specific machine.  In a perfect world, the specific platform shouldn't
  99 matter to the kernel because all platform details would be described
 100 perfectly by the device tree in a consistent and reliable manner.
 101 Hardware is not perfect though, and so the kernel must identify the
 102 machine during early boot so that it has the opportunity to run
 103 machine-specific fixups.
 104
 105 In the majority of cases, the machine identity is irrelevant, and the
 106 kernel will instead select setup code based on the machine's core
 107 CPU or SoC.  On ARM for example, setup_arch() in
 108 arch/arm/kernel/setup.c will call setup_machine_fdt() in
 109 arch/arm/kernel/devicetree.c which searches through the machine_desc
 110 table and selects the machine_desc which best matches the device tree
 111 data.  It determines the best match by looking at the 'compatible'
 112 property in the root device tree node, and comparing it with the
 113 dt_compat list in struct machine_desc.
 114
 115 The 'compatible' property contains a sorted list of strings starting
 116 with the exact name of the machine, followed by an optional list of
 117 boards it is compatible with sorted from most compatible to least.  For
 118 example, the root compatible properties for the TI BeagleBoard and its
 119 successor, the BeagleBoard xM board might look like:
 120
 121         compatible = "ti,omap3-beagleboard", "ti,omap3450", "ti,omap3";
 122         compatible = "ti,omap3-beagleboard-xm", "ti,omap3450", "ti,omap3";
 123
 124 Where "ti,omap3-beagleboard-xm" specifies the exact model, it also
 125 claims that it compatible with the OMAP 3450 SoC, and the omap3 family
 126 of SoCs in general.  You'll notice that the list is sorted from most
 127 specific (exact board) to least specific (SoC family).
 128
 129 Astute readers might point out that the Beagle xM could also claim
 130 compatibility with the original Beagle board.  However, one should be
 131 cautioned about doing so at the board level since there is typically a
 132 high level of change from one board to another, even within the same
 133 product line, and it is hard to nail down exactly what is meant when one
 134 board claims to be compatible with another.  For the top level, it is
 135 better to err on the side of caution and not claim one board is
 136 compatible with another.  The notable exception would be when one
 137 board is a carrier for another, such as a CPU module attached to a
 138 carrier board.
 139
 140 One more note on compatible values.  Any string used in a compatible
 141 property must be documented as to what it indicates.  Add
 142 documentation for compatible strings in Documentation/devicetree/bindings.
 143
 144 Again on ARM, for each machine_desc, the kernel looks to see if
 145 any of the dt_compat list entries appear in the compatible property.
 146 If one does, then that machine_desc is a candidate for driving the
 147 machine.  After searching the entire table of machine_descs,
 148 setup_machine_fdt() returns the 'most compatible' machine_desc based
 149 on which entry in the compatible property each machine_desc matches
 150 against.  If no matching machine_desc is found, then it returns NULL.
 151
 152 The reasoning behind this scheme is the observation that in the majority
 153 of cases, a single machine_desc can support a large number of boards
 154 if they all use the same SoC, or same family of SoCs.  However,
 155 invariably there will be some exceptions where a specific board will
 156 require special setup code that is not useful in the generic case.
 157 Special cases could be handled by explicitly checking for the
 158 troublesome board(s) in generic setup code, but doing so very quickly
 159 becomes ugly and/or unmaintainable if it is more than just a couple of
 160 cases.
 161
 162 Instead, the compatible list allows a generic machine_desc to provide
 163 support for a wide common set of boards by specifying "less
 164 compatible" value in the dt_compat list.  In the example above,
 165 generic board support can claim compatibility with "ti,omap3" or
 166 "ti,omap3450".  If a bug was discovered on the original beagleboard
 167 that required special workaround code during early boot, then a new
 168 machine_desc could be added which implements the workarounds and only
 169 matches on "ti,omap3-beagleboard".
 170
 171 PowerPC uses a slightly different scheme where it calls the .probe()
 172 hook from each machine_desc, and the first one returning TRUE is used.
 173 However, this approach does not take into account the priority of the
 174 compatible list, and probably should be avoided for new architecture
 175 support.
 176
 177 2.3 Runtime configuration
 178 -------------------------
 179 In most cases, a DT will be the sole method of communicating data from
 180 firmware to the kernel, so also gets used to pass in runtime and
 181 configuration data like the kernel parameters string and the location
 182 of an initrd image.
 183
 184 Most of this data is contained in the /chosen node, and when booting
 185 Linux it will look something like this:
 186
 187         chosen {
 188                 bootargs = "console=ttyS0,115200 loglevel=8";
 189                 initrd-start = <0xc8000000>;
 190                 initrd-end = <0xc8200000>;
 191         };
 192
 193 The bootargs property contains the kernel arguments, and the initrd-*
 194 properties define the address and size of an initrd blob.  Note that
 195 initrd-end is the first address after the initrd image, so this doesn't
 196 match the usual semantic of struct resource.  The chosen node may also
 197 optionally contain an arbitrary number of additional properties for
 198 platform-specific configuration data.
 199
 200 During early boot, the architecture setup code calls of_scan_flat_dt()
 201 several times with different helper callbacks to parse device tree
 202 data before paging is setup.  The of_scan_flat_dt() code scans through
 203 the device tree and uses the helpers to extract information required
 204 during early boot.  Typically the early_init_dt_scan_chosen() helper
 205 is used to parse the chosen node including kernel parameters,
 206 early_init_dt_scan_root() to initialize the DT address space model,
 207 and early_init_dt_scan_memory() to determine the size and
 208 location of usable RAM.
 209
 210 On ARM, the function setup_machine_fdt() is responsible for early
 211 scanning of the device tree after selecting the correct machine_desc
 212 that supports the board.
 213
 214 2.4 Device population
 215 ---------------------
 216 After the board has been identified, and after the early configuration data
 217 has been parsed, then kernel initialization can proceed in the normal
 218 way.  At some point in this process, unflatten_device_tree() is called
 219 to convert the data into a more efficient runtime representation.
 220 This is also when machine-specific setup hooks will get called, like
 221 the machine_desc .init_early(), .init_irq() and .init_machine() hooks
 222 on ARM.  The remainder of this section uses examples from the ARM
 223 implementation, but all architectures will do pretty much the same
 224 thing when using a DT.
 225
 226 As can be guessed by the names, .init_early() is used for any machine-
 227 specific setup that needs to be executed early in the boot process,
 228 and .init_irq() is used to set up interrupt handling.  Using a DT
 229 doesn't materially change the behaviour of either of these functions.
 230 If a DT is provided, then both .init_early() and .init_irq() are able
 231 to call any of the DT query functions (of_* in include/linux/of*.h) to
 232 get additional data about the platform.
 233
 234 The most interesting hook in the DT context is .init_machine() which
 235 is primarily responsible for populating the Linux device model with
 236 data about the platform.  Historically this has been implemented on
 237 embedded platforms by defining a set of static clock structures,
 238 platform_devices, and other data in the board support .c file, and
 239 registering it en-masse in .init_machine().  When DT is used, then
 240 instead of hard coding static devices for each platform, the list of
 241 devices can be obtained by parsing the DT, and allocating device
 242 structures dynamically.
 243
 244 The simplest case is when .init_machine() is only responsible for
 245 registering a block of platform_devices.  A platform_device is a concept
 246 used by Linux for memory or I/O mapped devices which cannot be detected
 247 by hardware, and for 'composite' or 'virtual' devices (more on those
 248 later).  While there is no 'platform device' terminology for the DT,
 249 platform devices roughly correspond to device nodes at the root of the
 250 tree and children of simple memory mapped bus nodes.
 251
 252 About now is a good time to lay out an example.  Here is part of the
 253 device tree for the NVIDIA Tegra board.
 254
 255 /{
 256         compatible = "nvidia,harmony", "nvidia,tegra20";
 257         #address-cells = <1>;
 258         #size-cells = <1>;
 259         interrupt-parent = <&intc>;
 260
 261         chosen { };
 262         aliases { };
 263
 264         memory {
 265                 device_type = "memory";
 266                 reg = <0x00000000 0x40000000>;
 267         };
 268
 269         soc {
 270                 compatible = "nvidia,tegra20-soc", "simple-bus";
 271                 #address-cells = <1>;
 272                 #size-cells = <1>;
 273                 ranges;
 274
 275                 intc: interrupt-controller@50041000 {
 276                         compatible = "nvidia,tegra20-gic";
 277                         interrupt-controller;
 278                         #interrupt-cells = <1>;
 279                         reg = <0x50041000 0x1000>, < 0x50040100 0x0100 >;
 280                 };
 281
 282                 serial@70006300 {
 283                         compatible = "nvidia,tegra20-uart";
 284                         reg = <0x70006300 0x100>;
 285                         interrupts = <122>;
 286                 };
 287
 288                 i2s1: i2s@70002800 {
 289                         compatible = "nvidia,tegra20-i2s";
 290                         reg = <0x70002800 0x100>;
 291                         interrupts = <77>;
 292                         codec = <&wm8903>;
 293                 };
 294
 295                 i2c@7000c000 {
 296                         compatible = "nvidia,tegra20-i2c";
 297                         #address-cells = <1>;
 298                         #size-cells = <0>;
 299                         reg = <0x7000c000 0x100>;
 300                         interrupts = <70>;
 301
 302                         wm8903: codec@1a {
 303                                 compatible = "wlf,wm8903";
 304                                 reg = <0x1a>;
 305                                 interrupts = <347>;
 306                         };
 307                 };
 308         };
 309
 310         sound {
 311                 compatible = "nvidia,harmony-sound";
 312                 i2s-controller = <&i2s1>;
 313                 i2s-codec = <&wm8903>;
 314         };
 315 };
 316
 317 At .init_machine() time, Tegra board support code will need to look at
 318 this DT and decide which nodes to create platform_devices for.
 319 However, looking at the tree, it is not immediately obvious what kind
 320 of device each node represents, or even if a node represents a device
 321 at all.  The /chosen, /aliases, and /memory nodes are informational
 322 nodes that don't describe devices (although arguably memory could be
 323 considered a device).  The children of the /soc node are memory mapped
 324 devices, but the codec@1a is an i2c device, and the sound node
 325 represents not a device, but rather how other devices are connected
 326 together to create the audio subsystem.  I know what each device is
 327 because I'm familiar with the board design, but how does the kernel
 328 know what to do with each node?
 329
 330 The trick is that the kernel starts at the root of the tree and looks
 331 for nodes that have a 'compatible' property.  First, it is generally
 332 assumed that any node with a 'compatible' property represents a device
 333 of some kind, and second, it can be assumed that any node at the root
 334 of the tree is either directly attached to the processor bus, or is a
 335 miscellaneous system device that cannot be described any other way.
 336 For each of these nodes, Linux allocates and registers a
 337 platform_device, which in turn may get bound to a platform_driver.
 338
 339 Why is using a platform_device for these nodes a safe assumption?
 340 Well, for the way that Linux models devices, just about all bus_types
 341 assume that its devices are children of a bus controller.  For
 342 example, each i2c_client is a child of an i2c_master.  Each spi_device
 343 is a child of an SPI bus.  Similarly for USB, PCI, MDIO, etc.  The
 344 same hierarchy is also found in the DT, where I2C device nodes only
 345 ever appear as children of an I2C bus node.  Ditto for SPI, MDIO, USB,
 346 etc.  The only devices which do not require a specific type of parent
 347 device are platform_devices (and amba_devices, but more on that
 348 later), which will happily live at the base of the Linux /sys/devices
 349 tree.  Therefore, if a DT node is at the root of the tree, then it
 350 really probably is best registered as a platform_device.
 351
 352 Linux board support code calls of_platform_populate(NULL, NULL, NULL, NULL)
 353 to kick off discovery of devices at the root of the tree.  The
 354 parameters are all NULL because when starting from the root of the
 355 tree, there is no need to provide a starting node (the first NULL), a
 356 parent struct device (the last NULL), and we're not using a match
 357 table (yet).  For a board that only needs to register devices,
 358 .init_machine() can be completely empty except for the
 359 of_platform_populate() call.
 360
 361 In the Tegra example, this accounts for the /soc and /sound nodes, but
 362 what about the children of the SoC node?  Shouldn't they be registered
 363 as platform devices too?  For Linux DT support, the generic behaviour
 364 is for child devices to be registered by the parent's device driver at
 365 driver .probe() time.  So, an i2c bus device driver will register a
 366 i2c_client for each child node, an SPI bus driver will register
 367 its spi_device children, and similarly for other bus_types.
 368 According to that model, a driver could be written that binds to the
 369 SoC node and simply registers platform_devices for each of its
 370 children.  The board support code would allocate and register an SoC
 371 device, a (theoretical) SoC device driver could bind to the SoC device,
 372 and register platform_devices for /soc/interrupt-controller, /soc/serial,
 373 /soc/i2s, and /soc/i2c in its .probe() hook.  Easy, right?
 374
 375 Actually, it turns out that registering children of some
 376 platform_devices as more platform_devices is a common pattern, and the
 377 device tree support code reflects that and makes the above example
 378 simpler.  The second argument to of_platform_populate() is an
 379 of_device_id table, and any node that matches an entry in that table
 380 will also get its child nodes registered.  In the tegra case, the code
 381 can look something like this:
 382
 383 static void __init harmony_init_machine(void)
 384 {
 385         /* ... */
 386         of_platform_populate(NULL, of_default_bus_match_table, NULL, NULL);
 387 }
 388
 389 "simple-bus" is defined in the ePAPR 1.0 specification as a property
 390 meaning a simple memory mapped bus, so the of_platform_populate() code
 391 could be written to just assume simple-bus compatible nodes will
 392 always be traversed.  However, we pass it in as an argument so that
 393 board support code can always override the default behaviour.
 394
 395 [Need to add discussion of adding i2c/spi/etc child devices]
 396
 397 Appendix A: AMBA devices
 398 ------------------------
 399
 400 ARM Primecells are a certain kind of device attached to the ARM AMBA
 401 bus which include some support for hardware detection and power
 402 management.  In Linux, struct amba_device and the amba_bus_type is
 403 used to represent Primecell devices.  However, the fiddly bit is that
 404 not all devices on an AMBA bus are Primecells, and for Linux it is
 405 typical for both amba_device and platform_device instances to be
 406 siblings of the same bus segment.
 407
 408 When using the DT, this creates problems for of_platform_populate()
 409 because it must decide whether to register each node as either a
 410 platform_device or an amba_device.  This unfortunately complicates the
 411 device creation model a little bit, but the solution turns out not to
 412 be too invasive.  If a node is compatible with "arm,amba-primecell", then
 413 of_platform_populate() will register it as an amba_device instead of a
 414 platform_device.