doc/configurable_failover

   1         Configurable Module Fail Over
   2         -----------------------------
   3
   4 Before configurable module failover, we had this kind of entry in
   5 "radiusd.conf":
   6
   7 #---
   8 authorize {
   9   preprocess
  10   files
  11 }
  12 #---
  13
  14   This entry instructed the "authorize" section to first process the
  15 request through the "preprocess" module, and if that returned success,
  16 to process it through "files" module.  If that sequence returned
  17 success, then the "authorize" stage itself would then return success.
  18 Processing was strictly linear and if one module failed, the whole
  19 section would fail immediately.
  20
  21   Configurable failover provides more flexibility. It takes advantage
  22 of the tree structure of radiusd.conf to support a configuration
  23 language that allows you to "group" modules that should work together
  24 in ways other than simple lists.  You can control the flow of any
  25 stage (e.g. "authorize") to fit your needs, without touching C code,
  26 just by altering radiusd.conf.
  27
  28   This configurable fail-over has a convenient short-hand, too.
  29 Administrators commonly want to say things like "try SQL1, if it's
  30 down, try SQL2, otherwise drop the request."
  31
  32   For example:
  33
  34 #---
  35   modules {
  36     sql sql1 {
  37       # configuration to connect to SQL database one
  38     }
  39     sql sql2 {
  40       # configuration to connect to SQL database two
  41     }
  42     always handled {
  43       rcode = handled
  44     }
  45   }
  46
  47   #  Handle accounting packets
  48   accounting {
  49       detail                    # always log to detail, stopping if it fails
  50       redundant {
  51         sql1                    # try module sql1
  52         sql2                    # if that's down, try module sql2
  53         handled                 # otherwise drop the request as
  54                                 # it's been "handled" by the "always"
  55                                 # module (see doc/rlm_always)
  56       }
  57   }
  58 #---
  59
  60   The "redundant" section is a configuration directive which tells the
  61 server to process the second module if the first one fails.  Any
  62 number of modules can be listed in a "redundant" section.  The server
  63 will process each in turn, until one of the modules succeeds.  It willthen stop processing the "redundant" list.
  64
  65 1. Rewriting results for single modules
  66    ------------------------------------
  67
  68   Normally, when a module fails, the entire section ("authorize",
  69 "accounting", etc.) stops being processed.  In some cases, we may want
  70 to permit "soft failures".  That is, we may want to tell the server
  71 that it is "ok" for a module to fail, and that the failure should not
  72 be treated as a fatal error.
  73
  74   In this case, the module is treated as a "section", rather than just
  75 as a single line in "radiusd.conf".  The configuration entries for
  76 that section are taken from the "configurable fail-over" code, and not
  77 from the configuration information for that module.
  78
  79   For example, the "detail" module normally returns "fail" if it is
  80 unable to write its information to the "detail" file.  As a test, we
  81 can configure the server so that it continues processing the request,
  82 even if the "detail" module fails.  The following example shows how:
  83
  84 #--
  85   #  Handle accounting packets
  86   accounting {
  87       detail {
  88         fail = 1
  89       }
  90       redundant {
  91         sql1
  92         sql2
  93         handled
  94       }
  95   }
  96 #--
  97
  98   The "fail = 1" entry tells the server to remember the "fail" code,
  99 with priority "1".  The normal configuration is "fail = return", which
 100 means "if the detail module fails, stop processing the accounting
 101 section".
 102
 103 2. Fail-over configuration entries
 104    -------------------------------
 105
 106   Modules normally return on of the following codes as their result:
 107
 108         Code            Meaning
 109         ----            ------
 110         notfound        the user was not found
 111         noop            the module did nothing
 112         ok              the module succeeded
 113         updated         the module updated information in the request
 114         fail            the module failed
 115         reject          the module rejected the user
 116         userlock        the user was locked out
 117         invalid         the user's configuration entry was invalid
 118         handled         the module has done everything to handle the request
 119
 120   In a configurable fail-over section, each of these codes may be
 121 listed, with a value.  If the code is not listed, or a configurable
 122 fail-over section is not defined, then values that make sense for the
 123 requested "group" (group, redundant, load-balance, etc) are used.
 124
 125   The special code "default" can be used to set all return codes to
 126 the specified value.  This value will be used with a lower priority
 127 than ones that are explicitly set.
 128
 129   The values for each code may be one of two things:
 130
 131         Value           Meaning
 132         -----           -------
 133         <number>        Priority for this return code.
 134         return          stop processing this configurable fail-over list.
 135         reject          Stop processing this configurable fail-over list.
 136                         and immediately return a reject.
 137
 138   The <number> used for a value may be any decimal number between 1
 139 and 99999.  The number is used when processing a list of modules, to
 140 determine which code is returned from the list.  For example, if
 141 "module1" returns "fail" with priority "1", and a later "module2"
 142 returns "ok" with priority "3", the return code from the list of
 143 modules will be "ok", because it has higher priority than "fail".
 144
 145   This configurability allows the administrator to permit some modules
 146 to fail, so long as a later module succeeds.
 147
 148
 149 3. More Complex Configurations
 150    ---------------------------
 151
 152   The "authorize" section is normally a list of module names.  We can
 153 create sub-lists by using the section name "group".  The "redundant"
 154 section above is just a short-hand for "group", with a set of default
 155 return codes, which are different than the normal "stop processing the
 156 list on failure".
 157
 158   For example, we can configure two detail modules, and allow either
 159 to fail, so long as one of them succeeds.
 160
 161 #--
 162   #  Handle accounting packets
 163   accounting {
 164       group {
 165         detail1 {
 166           fail = 1              # remember "fail" with priority 1
 167           ok = return           # if we succeed, don't do "detail2"
 168         }
 169         detail2 {
 170           fail = 1              # remember "fail" with priority 1
 171           ok = return           # if we succeed, return "ok"
 172                                 # if "detail1" returned "fail"
 173         }
 174       }                 # returns "fail" only if BOTH modules returned "fail"
 175       redundant {
 176         sql1
 177         sql2
 178         handled
 179       }
 180   }
 181
 182 #--
 183
 184   This configuration says:
 185
 186         log to "detail1", and stop processing the "group" list if
 187         "detail1" returned OK.
 188
 189         If "detail1" returned "fail", then continue, but remember the
 190         "fail" code, with priority 1.
 191
 192         If "detail2" fails, then remember "fail" with priority 1.
 193
 194         If "detail2" returned "ok", return "ok" from the "group".
 195
 196   The return code from the "group" is the return code which was either
 197 forced to return (e.g. "ok" for "detail1"), or the highest priority
 198 return code found by processing the list.
 199
 200   This process can be extended to any number of modules listed in a
 201 "group" section.
 202
 203
 204 4. More Complex Configuration using "if" and "else"
 205    ------------------------------------------------
 206
 207   As of version 2.0, the server allows "if"-style checking in the
 208 configuration sections.  The section is still processed as a list, so
 209 there is no looping or "goto" support.  But by using "if", the
 210 administrator can have branching paths of execution, where none was
 211 possible before.
 212
 213   For example, the following configuration says "run sql, if it
 214 returns notfound, run ldap1, else run ldap2".
 215
 216   authorize {
 217         ...
 218         sql
 219         if notfound {
 220                 ldap1
 221         }
 222         else {
 223                 ldap2
 224         }
 225
 226   Note that the parser is easily confused.  The words "if" and "else"
 227 MUST be the first entry on the line.  Using statements like "} else {"
 228 is forbidden, and will prevent the server from starting.  Putting
 229 brackets around the condition like "if (notfound)" won't work, either.
 230
 231   If you want to specify multiple conditions, put them in double
 232 quotes, and separate the conditions by a single '|' character, as
 233 follows:
 234
 235         if "notfound | ok | fail" {
 236                 ...
 237         }
 238
 239   The reason for these limitations is that the "if" conditions are
 240 overloading module names, and therefore have to follow a similar
 241 syntax.  These limitations may be removed in a future release.
 242
 243   The conditions that can be checked are the names listed in section
 244 2, above.  Any other condition is not permitted.
 245
 246
 247   You can also use "elsif", as follows:
 248
 249         if notfound {
 250                 ldap1
 251         }
 252         elsif fail {
 253                 ldap2
 254         }
 255         else {
 256                 ldap3
 257         }
 258
 259   Note that the condition being checked is the return code of the last
 260 module or group that was executed.  This may sometimes have odd
 261 side-effects:
 262
 263         sql
 264         if notfound {
 265                 ldap1
 266         }
 267         if fail {
 268                 ldap2
 269         }
 270
 271   In this case, the "ldap2" module will be executed if the "sql"
 272 modules returns "fail", OR if the "sql" module returns "notfound", and
 273 the "ldap2" module returns "fail".  For this reason, you should
 274 probably use "elsif" whenever you have two "if" statements right after
 275 one another.
 276
 277   The "if" checks can be nested to a depth of 30 or so, which should
 278 be sufficient for most configurations.
 279
 280
 281 5. Virtual Modules
 282    ---------------
 283
 284   Some configurations may require using the same list of modules, in
 285 the same order, in multiple sections.  For those systems, the
 286 configuration can be simplified through the use of "virtual" modules.
 287 These modules are configured as named sub-sections of the
 288 "instantiate" section, as follows:
 289
 290         instantiate {
 291                 ...
 292
 293                 redundant sql1_or_2 {
 294                         sql1
 295                         sql2
 296                 }
 297         }
 298
 299   The name "sql1_or_2" can then be used in any other section, such as
 300 "authorize" or "accounting".  The result will be *exactly* as if that
 301 section was placed at the location of the "sql1_or_2" reference.
 302
 303   These virtual modules are full-fledged objects in and of themselves.
 304 One virtual module can refer to another virtual module, and they can
 305 contain "if" conditions, or any other configuration permitted in a
 306 section.
 307
 308
 309 7. Redundancy and Load-Balancing
 310    -----------------------------
 311
 312   See doc/load-balance.txt for information on simple redundancy
 313 (fail-over) and load balancing.
 314
 315
 316 6. The Gory Details
 317    ----------------
 318
 319 The fundamental object is called a MODCALLABLE, because it is something that
 320 can be passed a specific radius request and returns one of the RLM_MODULE_*
 321 results. It is a function - if you can accept the fact that pieces of
 322 radiusd.conf are functions. There are two kinds of MODCALLABLEs: GROUPs and
 323 SINGLEs.
 324
 325 A SINGLE is a reference to a module instance that was set up in the modules{}
 326 section of radiusd.conf, like "preprocess" or "sql1". When a SINGLE is
 327 called, the corresponding function in the rlm is invoked, and whichever
 328 RLM_MODULE_* it returns becomes the RESULT of the SINGLE.
 329
 330 A GROUP is a section of radiusd.conf that includes some MODCALLABLEs.
 331 Examples of GROUPs above include "authorize{...}", which implements the C
 332 function module_authorize, and "redundant{...}", which contains two SINGLEs
 333 that refer to a couple of redundant databases. Note that a GROUP can contain
 334 other GROUPs - "Auth-Type SQL{...}" is also a GROUP, which implements the C
 335 function module_authenticate when Auth-Type is set to SQL.
 336
 337 Now here's the fun part - what happens when a GROUP is called? It simply runs
 338 through all of its children in order, and calls each one, whether it is
 339 another GROUP or a SINGLE. It then looks at the RESULT of that child, and
 340 takes some ACTION, which is basically either "return that RESULT immediately"
 341 or "Keep going". In the first example, any "bad" RESULT from the preprocess
 342 module causes an immediate return, and any "good" RESULT causes the
 343 authorize{...} GROUP to proceed to the files module.
 344
 345 We can see the exact rules by writing them out the long way:
 346
 347 authorize {
 348   preprocess {
 349     notfound = 1
 350     noop     = 2
 351     ok       = 3
 352     updated  = 4
 353     fail     = return
 354     reject   = return
 355     userlock = return
 356     invalid  = return
 357     handled  = return
 358   }
 359   files {
 360     notfound = 1
 361     noop     = 2
 362     ok       = 3
 363     updated  = 4
 364     fail     = return
 365     reject   = return
 366     userlock = return
 367     invalid  = return
 368     handled  = return
 369   }
 370 }
 371
 372 This is the same as the first example, with the behavior explicitly
 373 spelled out. Each SINGLE becomes its own section, containing a list of
 374 RESULTs that it may return and what ACTION should follow from them. So
 375 preprocess is called, and if it returns for example RLM_MODULE_REJECT,
 376 then the reject=return rule is applied, and the authorize{...} GROUP
 377 itself immediately returns RLM_MODULE_REJECT.
 378
 379 If preprocess returns RLM_MODULE_NOOP, the corresponding ACTION is "2". An
 380 integer ACTION serves two purposes - first, it tells the parent GROUP to go
 381 on to the next module. Second, it is a hint as to how desirable this RESULT
 382 is as a candidate for the GROUP's own RESULT. So files is called... suppose
 383 it returns RLM_MODULE_NOTFOUND. The ACTION for notfound inside the files{...}
 384 block is "1". We have now reached the end of the authorize{...} GROUP and we
 385 look at the RESULTs we accumulated along the way - there is a noop with
 386 preference level 2, and a notfound with preference level 1, so the
 387 authorize{...} GROUP as a whole returns RLM_MODULE_NOOP, which makes sense
 388 because to say the user was not found at all would be a lie, since preprocess
 389 apparently found him, or else it would have returned RLM_MODULE_NOTFOUND too.
 390
 391 We could use the "default" code to simplify the above example a
 392 little.  The following two configurations are identical:
 393
 394 ...
 395   files {
 396     notfound = 1
 397     noop     = 2
 398     ok       = 3
 399     updated  = 4
 400     default  = return
 401   }
 402 ...
 403
 404 When putting the "default" first, later definitions over-ride it's
 405 return code:
 406
 407 ...
 408   files {
 409     default  = return
 410     notfound = 1
 411     noop     = 2
 412     ok       = 3
 413     updated  = 4
 414   }
 415 ...
 416
 417
 418 [Take a deep breath - the worst is over]
 419
 420 That RESULT preference/desirability stuff is pretty complex, but my hope is
 421 that it will be complex enough to handle the needs of everyone's real-world
 422 imperfect systems, while staying out of sight most of the time since the
 423 defaults will be right for the most common configurations.
 424
 425 So where does redundant{...} fit in with all that? Well, redundant{...} is
 426 simply a group that changes the default ACTIONs to something like
 427
 428   fail = 1
 429   everythingelse = return
 430
 431 so that when one module fails, we keep trying until we find one that doesn't
 432 fail, then return whatever it returned. And at the end, if they all failed,
 433 the redundant GROUP as a whole returns RLM_MODULE_FAIL, just as you'd want it
 434 to (I hope).
 435
 436 There are two other kinds of grouping: group{...} which does not have any
 437 specialized default ACTIONs, and append{...}, which should be used when you
 438 have separate but similarly structured databases that are guaranteed not to
 439 overlap.
 440
 441 That's all that really needs to be said. But now a few random notes:
 442
 443 1. GROUPs may have RESULT=ACTION specifiers too! It would look like this:
 444
 445   authorize {
 446     preprocess
 447     redundant {
 448       sql1
 449       sql2
 450       notfound = return
 451     }
 452     files
 453   }
 454
 455 which would prevent rlm_files from being called if neither of the SQL
 456 instances could find the user.
 457
 458 2. redundant{...} and append{...} are just shortcuts. You could write
 459     group {
 460       sql1 {
 461         fail     = 1
 462         notfound = 2
 463         noop     = return
 464         ok       = return
 465         updated  = return
 466         reject   = return
 467         userlock = return
 468         invalid  = return
 469         handled  = return
 470       }
 471       sql2 {
 472         fail     = 1
 473         notfound = 2
 474         noop     = return
 475         ok       = return
 476         updated  = return
 477         reject   = return
 478         userlock = return
 479         invalid  = return
 480         handled  = return
 481       }
 482     }
 483   instead of
 484     redundant {
 485       sql1
 486       sql2
 487     }
 488   but the latter is just a whole lot easier to read.
 489
 490 3. "authenticate{...}" itself is not a GROUP, even though it contains a list
 491 of Auth-Type GROUPs, because its semantics are totally different - it uses
 492 Auth-Type to decide which of its members to call, and their order is
 493 irrelevant.
 494
 495 4. The default rules are context-sensitive - for authorize, the defaults are
 496 what you saw above - notfound, noop, ok, and updated are considered
 497 success, and anything else has an ACTION of "return". For authenticate, the
 498 default is to return on success *or* reject, and only try the second and
 499 following items if the first one fails. You can read all the default ACTIONs
 500 in modcall.c (int defaultactions[][][]), or just trust me. They do the right
 501 thing.
 502
 503 5. There are some rules that can't be implemented in this language - things
 504 like "notfound = 1-reject", "noop = 2-ok", "ok = 3-ok", etc. But I don't feel
 505 justified adding that complexity in the first draft.
 506 There are already enough things here that may never see real-world usage.
 507 Like append{...}
 508
 509 -- Pac. 9/18/2000