doc/configurable_failover

   1         Configurable Module Fail Over
   2         -----------------------------
   3
   4 Before configurable module failover, we had this kind of entry in
   5 "radiusd.conf":
   6
   7 #---
   8 authorize {
   9   preprocess
  10   files
  11 }
  12 #---
  13
  14   This entry instructed the "authorize" section to first process the
  15 request through the "preprocess" module, and if that returned success,
  16 to process it through "files" module.  If that sequence returned
  17 success, then the "authorize" stage itself would then return success.
  18 Processing was strictly linear and if one module failed, the whole
  19 section would fail immediately.
  20
  21   Configurable failover provides more flexibility. It takes advantage
  22 of the tree structure of radiusd.conf to support a configuration
  23 language that allows you to "group" modules that should work together
  24 in ways other than simple lists.  You can control the flow of any
  25 stage (e.g. "authorize") to fit your needs, without touching C code,
  26 just by altering radiusd.conf.
  27
  28   This configurable fail-over has a convenient short-hand, too.
  29 Administrators commonly want to say things like "try SQL1, if it's
  30 down, try SQL2, otherwise drop the request."
  31
  32   For example:
  33
  34 #---
  35   modules {
  36     sql sql1 {
  37       # configuration to connect to SQL database one
  38     }
  39     sql sql2 {
  40       # configuration to connect to SQL database two
  41     }
  42     always handled {
  43       rcode = handled
  44     }
  45   }
  46
  47   #  Handle accounting packets
  48   accounting {
  49       detail                    # always log to detail, stopping if it fails
  50       redundant {
  51         sql1                    # try module sql1
  52         sql2                    # if that's down, try module sql2
  53         handled                 # otherwise drop the request as
  54                                 # it's been "handled" by the "always"
  55                                 # module (see doc/rlm_always)
  56       }
  57   }
  58 #---
  59
  60   The "redundant" section is a configuration directive which tells the
  61 server to process the second module if the first one fails.  Any
  62 number of modules can be listed in a "redundant" section.  The server
  63 will process each in turn, until one of the modules succeeds.  It willthen stop processing the "redundant" list.
  64
  65   Rewriting results for single modules
  66   ------------------------------------
  67
  68   Normally, when a module fails, the entire section ("authorize",
  69 "accounting", etc.) stops being processed.  In some cases, we may want
  70 to permit "soft failures".  That is, we may want to tell the server
  71 that it is "ok" for a module to fail, and that the failure should not
  72 be treated as a fatal error.
  73
  74   In this case, the module is treated as a "section", rather than just
  75 as a single line in "radiusd.conf".  The configuration entries for
  76 that section are taken from the "configurable fail-over" code, and not
  77 from the configuration information for that module.
  78
  79   For example, the "detail" module normally returns "fail" if it is
  80 unable to write its information to the "detail" file.  As a test, we
  81 can configure the server so that it continues processing the request,
  82 even if the "detail" module fails.  The following example shows how:
  83
  84 #--
  85   #  Handle accounting packets
  86   accounting {
  87       detail {
  88         fail = 1
  89       }
  90       redundant {
  91         sql1
  92         sql2
  93         handled
  94       }
  95   }
  96 #--
  97
  98   The "fail = 1" entry tells the server to remember the "fail" code,
  99 with priority "1".  The normal configuration is "fail = return", which
 100 means "if the detail module fails, stop processing the accounting
 101 section".
 102
 103   Fail-over configuration entries
 104   -------------------------------
 105
 106   Modules normally return on of the following codes as their result:
 107
 108         Code            Meaning
 109         ----            ------
 110         notfound        the user was not found
 111         noop            the module did nothing
 112         ok              the module succeeded
 113         updated         the module updated information in the request
 114         fail            the module failed
 115         reject          the module rejected the user
 116         userlock        the user was locked out
 117         invalid         the user's configuration entry was invalid
 118         handled         the module has done everything to handle the request
 119
 120   In a configurable fail-over section, each of these codes may be
 121 listed, with a value.  If the code is not listed, or a configurable
 122 fail-over section is not defined, then values that make sense for the
 123 requested "group" (group, redundant, load-balance, etc) are used.
 124
 125   The values for each code may be one of two things:
 126
 127         Value           Meaning
 128         -----           -------
 129         <number>        Priority for this return code.
 130         return          stop processing this configurable fail-over list.
 131         reject          Stop processing this configurable fail-over list.
 132                         and immediately return a reject.
 133
 134   The <number> used for a value may be any decimal number between 1
 135 and 99999.  The number is used when processing a list of modules, to
 136 determine which code is returned from the list.  For example, if
 137 "module1" returns "fail" with priority "1", and a later "module2"
 138 returns "ok" with priority "3", the return code from the list of
 139 modules will be "ok", because it has higher priority than "fail".
 140
 141   This configurability allows the administrator to permit some modules
 142 to fail, so long as a later module succeeds.
 143
 144
 145   More Complex Configurations
 146   ---------------------------
 147
 148   The "authorize" section is normally a list of module names.  We can
 149 create sub-lists by using the section name "group".  The "redundant"
 150 section above is just a short-hand for "group", with a set of default
 151 return codes, which are different than the normal "stop processing the
 152 list on failure".
 153
 154   For example, we can configure two detail modules, and allow either
 155 to fail, so long as one of them succeeds.
 156
 157 #--
 158   #  Handle accounting packets
 159   accounting {
 160       group {
 161         detail1 {
 162           fail = 1              # remember "fail" with priority 1
 163           ok = return           # if we succeed, don't do "detail2"
 164         }
 165         detail2 {
 166           fail = 1              # remember "fail" with priority 1
 167           ok = return           # if we succeed, return "ok"
 168                                 # if "detail1" returned "fail"
 169         }
 170       }                 # returns "fail" only if BOTH modules returned "fail"
 171       redundant {
 172         sql1
 173         sql2
 174         handled
 175       }
 176   }
 177
 178 #--
 179
 180   This configuration says:
 181
 182         log to "detail1", and stop processing the "group" list if
 183         "detail1" returned OK.
 184
 185         If "detail1" returned "fail", then continue, but remember the
 186         "fail" code, with priority 1.
 187
 188         If "detail2" fails, then remember "fail" with priority 1.
 189
 190         If "detail2" returned "ok", return "ok" from the "group".
 191
 192   The return code from the "group" is the return code which was either
 193 forced to return (e.g. "ok" for "detail1"), or the highest priority
 194 return code found by processing the list.
 195
 196   This process can be extended to any number of modules listed in a
 197 "group" section.
 198
 199
 200   The Gory Details
 201   ----------------
 202
 203 The fundamental object is called a MODCALLABLE, because it is something that
 204 can be passed a specific radius request and returns one of the RLM_MODULE_*
 205 results. It is a function - if you can accept the fact that pieces of
 206 radiusd.conf are functions. There are two kinds of MODCALLABLEs: GROUPs and
 207 SINGLEs.
 208
 209 A SINGLE is a reference to a module instance that was set up in the modules{}
 210 section of radiusd.conf, like "preprocess" or "sql1". When a SINGLE is
 211 called, the corresponding function in the rlm is invoked, and whichever
 212 RLM_MODULE_* it returns becomes the RESULT of the SINGLE.
 213
 214 A GROUP is a section of radiusd.conf that includes some MODCALLABLEs.
 215 Examples of GROUPs above include "authorize{...}", which implements the C
 216 function module_authorize, and "redundant{...}", which contains two SINGLEs
 217 that refer to a couple of redundant databases. Note that a GROUP can contain
 218 other GROUPs - "Auth-Type SQL{...}" is also a GROUP, which implements the C
 219 function module_authenticate when Auth-Type is set to SQL.
 220
 221 Now here's the fun part - what happens when a GROUP is called? It simply runs
 222 through all of its children in order, and calls each one, whether it is
 223 another GROUP or a SINGLE. It then looks at the RESULT of that child, and
 224 takes some ACTION, which is basically either "return that RESULT immediately"
 225 or "Keep going". In the first example, any "bad" RESULT from the preprocess
 226 module causes an immediate return, and any "good" RESULT causes the
 227 authorize{...} GROUP to proceed to the files module.
 228
 229 We can see the exact rules by writing them out the long way:
 230
 231 authorize {
 232   preprocess {
 233     notfound = 1
 234     noop     = 2
 235     ok       = 3
 236     updated  = 4
 237     fail     = return
 238     reject   = return
 239     userlock = return
 240     invalid  = return
 241     handled  = return
 242   }
 243   files {
 244     notfound = 1
 245     noop     = 2
 246     ok       = 3
 247     updated  = 4
 248     fail     = return
 249     reject   = return
 250     userlock = return
 251     invalid  = return
 252     handled  = return
 253   }
 254 }
 255
 256 This is the same as the first example, with the behavior explicitly
 257 spelled out. Each SINGLE becomes its own section, containing a list of
 258 RESULTs that it may return and what ACTION should follow from them. So
 259 preprocess is called, and if it returns for example RLM_MODULE_REJECT,
 260 then the reject=return rule is applied, and the authorize{...} GROUP
 261 itself immediately returns RLM_MODULE_REJECT.
 262
 263 If preprocess returns RLM_MODULE_NOOP, the corresponding ACTION is "2". An
 264 integer ACTION serves two purposes - first, it tells the parent GROUP to go
 265 on to the next module. Second, it is a hint as to how desirable this RESULT
 266 is as a candidate for the GROUP's own RESULT. So files is called... suppose
 267 it returns RLM_MODULE_NOTFOUND. The ACTION for notfound inside the files{...}
 268 block is "1". We have now reached the end of the authorize{...} GROUP and we
 269 look at the RESULTs we accumulated along the way - there is a noop with
 270 preference level 2, and a notfound with preference level 1, so the
 271 authorize{...} GROUP as a whole returns RLM_MODULE_NOOP, which makes sense
 272 because to say the user was not found at all would be a lie, since preprocess
 273 apparently found him, or else it would have returned RLM_MODULE_NOTFOUND too.
 274
 275 [Take a deep breath - the worst is over]
 276
 277 That RESULT preference/desirability stuff is pretty complex, but my hope is
 278 that it will be complex enough to handle the needs of everyone's real-world
 279 imperfect systems, while staying out of sight most of the time since the
 280 defaults will be right for the most common configurations.
 281
 282 So where does redundant{...} fit in with all that? Well, redundant{...} is
 283 simply a group that changes the default ACTIONs to something like
 284
 285   fail = 1
 286   everythingelse = return
 287
 288 so that when one module fails, we keep trying until we find one that doesn't
 289 fail, then return whatever it returned. And at the end, if they all failed,
 290 the redundant GROUP as a whole returns RLM_MODULE_FAIL, just as you'd want it
 291 to (I hope).
 292
 293 There are two other kinds of grouping: group{...} which does not have any
 294 specialized default ACTIONs, and append{...}, which should be used when you
 295 have separate but similarly structured databases that are guaranteed not to
 296 overlap.
 297
 298 That's all that really needs to be said. But now a few random notes:
 299
 300 1. GROUPs may have RESULT=ACTION specifiers too! It would look like this:
 301
 302   authorize {
 303     preprocess
 304     redundant {
 305       sql1
 306       sql2
 307       notfound = return
 308     }
 309     files
 310   }
 311
 312 which would prevent rlm_files from being called if neither of the SQL
 313 instances could find the user.
 314
 315 2. redundant{...} and append{...} are just shortcuts. You could write
 316     group {
 317       sql1 {
 318         fail     = 1
 319         notfound = 2
 320         noop     = return
 321         ok       = return
 322         updated  = return
 323         reject   = return
 324         userlock = return
 325         invalid  = return
 326         handled  = return
 327       }
 328       sql2 {
 329         fail     = 1
 330         notfound = 2
 331         noop     = return
 332         ok       = return
 333         updated  = return
 334         reject   = return
 335         userlock = return
 336         invalid  = return
 337         handled  = return
 338       }
 339     }
 340   instead of
 341     redundant {
 342       sql1
 343       sql2
 344     }
 345   but the latter is just a whole lot easier to read.
 346
 347 3. "authenticate{...}" itself is not a GROUP, even though it contains a list
 348 of Auth-Type GROUPs, because its semantics are totally different - it uses
 349 Auth-Type to decide which of its members to call, and their order is
 350 irrelevant.
 351
 352 4. The default rules are context-sensitive - for authorize, the defaults are
 353 what you saw above - notfound, noop, ok, and updated are considered
 354 success, and anything else has an ACTION of "return". For authenticate, the
 355 default is to return on success *or* reject, and only try the second and
 356 following items if the first one fails. You can read all the default ACTIONs
 357 in modcall.c (int defaultactions[][][]), or just trust me. They do the right
 358 thing.
 359
 360 5. There are some rules that can't be implemented in this language - things
 361 like "notfound = 1-reject", "noop = 2-ok", "ok = 3-ok", etc. But I don't feel
 362 justified adding that complexity in the first draft.
 363 There are already enough things here that may never see real-world usage.
 364 Like append{...}
 365
 366 -- Pac. 9/18/2000