1 Configurable Module Fail Over
2 -----------------------------
4 Before configurable module failover, we had this kind of entry in
14 This entry instructed the "authorize" section to first process the
15 request through the "preprocess" module, and if that returned success,
16 to process it through "files" module. If that sequence returned
17 success, then the "authorize" stage itself would then return success.
18 Processing was strictly linear and if one module failed, the whole
19 section would fail immediately.
21 Configurable failover provides more flexibility. It takes advantage
22 of the tree structure of radiusd.conf to support a configuration
23 language that allows you to "group" modules that should work together
24 in ways other than simple lists. You can control the flow of any
25 stage (e.g. "authorize") to fit your needs, without touching C code,
26 just by altering radiusd.conf.
28 This configurable fail-over has a convenient short-hand, too.
29 Administrators commonly want to say things like "try SQL1, if it's
30 down, try SQL2, otherwise drop the request."
37 # configuration to connect to SQL database one
40 # configuration to connect to SQL database two
47 # Handle accounting packets
49 detail # always log to detail, stopping if it fails
51 sql1 # try module sql1
52 sql2 # if that's down, try module sql2
53 handled # otherwise drop the request as
54 # it's been "handled" by the "always"
55 # module (see doc/rlm_always)
60 The "redundant" section is a configuration directive which tells the
61 server to process the second module if the first one fails. Any
62 number of modules can be listed in a "redundant" section. The server
63 will process each in turn, until one of the modules succeeds. It willthen stop processing the "redundant" list.
65 1. Rewriting results for single modules
66 ------------------------------------
68 Normally, when a module fails, the entire section ("authorize",
69 "accounting", etc.) stops being processed. In some cases, we may want
70 to permit "soft failures". That is, we may want to tell the server
71 that it is "ok" for a module to fail, and that the failure should not
72 be treated as a fatal error.
74 In this case, the module is treated as a "section", rather than just
75 as a single line in "radiusd.conf". The configuration entries for
76 that section are taken from the "configurable fail-over" code, and not
77 from the configuration information for that module.
79 For example, the "detail" module normally returns "fail" if it is
80 unable to write its information to the "detail" file. As a test, we
81 can configure the server so that it continues processing the request,
82 even if the "detail" module fails. The following example shows how:
85 # Handle accounting packets
98 The "fail = 1" entry tells the server to remember the "fail" code,
99 with priority "1". The normal configuration is "fail = return", which
100 means "if the detail module fails, stop processing the accounting
103 2. Fail-over configuration entries
104 -------------------------------
106 Modules normally return on of the following codes as their result:
110 notfound the user was not found
111 noop the module did nothing
112 ok the module succeeded
113 updated the module updated information in the request
114 fail the module failed
115 reject the module rejected the user
116 userlock the user was locked out
117 invalid the user's configuration entry was invalid
118 handled the module has done everything to handle the request
120 In a configurable fail-over section, each of these codes may be
121 listed, with a value. If the code is not listed, or a configurable
122 fail-over section is not defined, then values that make sense for the
123 requested "group" (group, redundant, load-balance, etc) are used.
125 The special code "default" can be used to set all return codes to
126 the specified value. This value will be used with a lower priority
127 than ones that are explicitly set.
129 The values for each code may be one of two things:
133 <number> Priority for this return code.
134 return stop processing this configurable fail-over list.
135 reject Stop processing this configurable fail-over list.
136 and immediately return a reject.
138 The <number> used for a value may be any decimal number between 1
139 and 99999. The number is used when processing a list of modules, to
140 determine which code is returned from the list. For example, if
141 "module1" returns "fail" with priority "1", and a later "module2"
142 returns "ok" with priority "3", the return code from the list of
143 modules will be "ok", because it has higher priority than "fail".
145 This configurability allows the administrator to permit some modules
146 to fail, so long as a later module succeeds.
149 3. More Complex Configurations
150 ---------------------------
152 The "authorize" section is normally a list of module names. We can
153 create sub-lists by using the section name "group". The "redundant"
154 section above is just a short-hand for "group", with a set of default
155 return codes, which are different than the normal "stop processing the
158 For example, we can configure two detail modules, and allow either
159 to fail, so long as one of them succeeds.
162 # Handle accounting packets
166 fail = 1 # remember "fail" with priority 1
167 ok = return # if we succeed, don't do "detail2"
170 fail = 1 # remember "fail" with priority 1
171 ok = return # if we succeed, return "ok"
172 # if "detail1" returned "fail"
174 } # returns "fail" only if BOTH modules returned "fail"
184 This configuration says:
186 log to "detail1", and stop processing the "group" list if
187 "detail1" returned OK.
189 If "detail1" returned "fail", then continue, but remember the
190 "fail" code, with priority 1.
192 If "detail2" fails, then remember "fail" with priority 1.
194 If "detail2" returned "ok", return "ok" from the "group".
196 The return code from the "group" is the return code which was either
197 forced to return (e.g. "ok" for "detail1"), or the highest priority
198 return code found by processing the list.
200 This process can be extended to any number of modules listed in a
204 4. More Complex Configuration using "if" and "else"
205 ------------------------------------------------
207 As of version 2.0, the server allows "if"-style checking in the
208 configuration sections. The section is still processed as a list, so
209 there is no looping or "goto" support. But by using "if", the
210 administrator can have branching paths of execution, where none was
213 For example, the following configuration says "run sql, if it
214 returns notfound, run ldap1, else run ldap2".
226 Note that the parser is easily confused. The words "if" and "else"
227 MUST be the first entry on the line. Using statements like "} else {"
228 is forbidden, and will prevent the server from starting. Putting
229 brackets around the condition like "if (notfound)" won't work, either.
231 If you want to specify multiple conditions, put them in double
232 quotes, and separate the conditions by a single '|' character, as
235 if "notfound | ok | fail" {
239 The reason for these limitations is that the "if" conditions are
240 overloading module names, and therefore have to follow a similar
241 syntax. These limitations may be removed in a future release.
243 The conditions that can be checked are the names listed in section
244 2, above. Any other condition is not permitted.
247 You can also use "elsif", as follows:
259 Note that the condition being checked is the return code of the last
260 module or group that was executed. This may sometimes have odd
271 In this case, the "ldap2" module will be executed if the "sql"
272 modules returns "fail", OR if the "sql" module returns "notfound", and
273 the "ldap2" module returns "fail". For this reason, you should
274 probably use "elsif" whenever you have two "if" statements right after
277 The "if" checks can be nested to a depth of 30 or so, which should
278 be sufficient for most configurations.
284 Some configurations may require using the same list of modules, in
285 the same order, in multiple sections. For those systems, the
286 configuration can be simplified through the use of "virtual" modules.
287 These modules are configured as named sub-sections of the
288 "instantiate" section, as follows:
293 redundant sql1_or_2 {
299 The name "sql1_or_2" can then be used in any other section, such as
300 "authorize" or "accounting". The result will be *exactly* as if that
301 section was placed at the location of the "sql1_or_2" reference.
303 These virtual modules are full-fledged objects in and of themselves.
304 One virtual module can refer to another virtual module, and they can
305 contain "if" conditions, or any other configuration permitted in a
309 7. Redundancy and Load-Balancing
310 -----------------------------
312 See doc/load-balance.txt for information on simple redundancy
313 (fail-over) and load balancing.
319 The fundamental object is called a MODCALLABLE, because it is something that
320 can be passed a specific radius request and returns one of the RLM_MODULE_*
321 results. It is a function - if you can accept the fact that pieces of
322 radiusd.conf are functions. There are two kinds of MODCALLABLEs: GROUPs and
325 A SINGLE is a reference to a module instance that was set up in the modules{}
326 section of radiusd.conf, like "preprocess" or "sql1". When a SINGLE is
327 called, the corresponding function in the rlm is invoked, and whichever
328 RLM_MODULE_* it returns becomes the RESULT of the SINGLE.
330 A GROUP is a section of radiusd.conf that includes some MODCALLABLEs.
331 Examples of GROUPs above include "authorize{...}", which implements the C
332 function module_authorize, and "redundant{...}", which contains two SINGLEs
333 that refer to a couple of redundant databases. Note that a GROUP can contain
334 other GROUPs - "Auth-Type SQL{...}" is also a GROUP, which implements the C
335 function module_authenticate when Auth-Type is set to SQL.
337 Now here's the fun part - what happens when a GROUP is called? It simply runs
338 through all of its children in order, and calls each one, whether it is
339 another GROUP or a SINGLE. It then looks at the RESULT of that child, and
340 takes some ACTION, which is basically either "return that RESULT immediately"
341 or "Keep going". In the first example, any "bad" RESULT from the preprocess
342 module causes an immediate return, and any "good" RESULT causes the
343 authorize{...} GROUP to proceed to the files module.
345 We can see the exact rules by writing them out the long way:
372 This is the same as the first example, with the behavior explicitly
373 spelled out. Each SINGLE becomes its own section, containing a list of
374 RESULTs that it may return and what ACTION should follow from them. So
375 preprocess is called, and if it returns for example RLM_MODULE_REJECT,
376 then the reject=return rule is applied, and the authorize{...} GROUP
377 itself immediately returns RLM_MODULE_REJECT.
379 If preprocess returns RLM_MODULE_NOOP, the corresponding ACTION is "2". An
380 integer ACTION serves two purposes - first, it tells the parent GROUP to go
381 on to the next module. Second, it is a hint as to how desirable this RESULT
382 is as a candidate for the GROUP's own RESULT. So files is called... suppose
383 it returns RLM_MODULE_NOTFOUND. The ACTION for notfound inside the files{...}
384 block is "1". We have now reached the end of the authorize{...} GROUP and we
385 look at the RESULTs we accumulated along the way - there is a noop with
386 preference level 2, and a notfound with preference level 1, so the
387 authorize{...} GROUP as a whole returns RLM_MODULE_NOOP, which makes sense
388 because to say the user was not found at all would be a lie, since preprocess
389 apparently found him, or else it would have returned RLM_MODULE_NOTFOUND too.
391 We could use the "default" code to simplify the above example a
392 little. The following two configurations are identical:
404 When putting the "default" first, later definitions over-ride it's
418 [Take a deep breath - the worst is over]
420 That RESULT preference/desirability stuff is pretty complex, but my hope is
421 that it will be complex enough to handle the needs of everyone's real-world
422 imperfect systems, while staying out of sight most of the time since the
423 defaults will be right for the most common configurations.
425 So where does redundant{...} fit in with all that? Well, redundant{...} is
426 simply a group that changes the default ACTIONs to something like
429 everythingelse = return
431 so that when one module fails, we keep trying until we find one that doesn't
432 fail, then return whatever it returned. And at the end, if they all failed,
433 the redundant GROUP as a whole returns RLM_MODULE_FAIL, just as you'd want it
436 There are two other kinds of grouping: group{...} which does not have any
437 specialized default ACTIONs, and append{...}, which should be used when you
438 have separate but similarly structured databases that are guaranteed not to
441 That's all that really needs to be said. But now a few random notes:
443 1. GROUPs may have RESULT=ACTION specifiers too! It would look like this:
455 which would prevent rlm_files from being called if neither of the SQL
456 instances could find the user.
458 2. redundant{...} and append{...} are just shortcuts. You could write
488 but the latter is just a whole lot easier to read.
490 3. "authenticate{...}" itself is not a GROUP, even though it contains a list
491 of Auth-Type GROUPs, because its semantics are totally different - it uses
492 Auth-Type to decide which of its members to call, and their order is
495 4. The default rules are context-sensitive - for authorize, the defaults are
496 what you saw above - notfound, noop, ok, and updated are considered
497 success, and anything else has an ACTION of "return". For authenticate, the
498 default is to return on success *or* reject, and only try the second and
499 following items if the first one fails. You can read all the default ACTIONs
500 in modcall.c (int defaultactions[][][]), or just trust me. They do the right
503 5. There are some rules that can't be implemented in this language - things
504 like "notfound = 1-reject", "noop = 2-ok", "ok = 3-ok", etc. But I don't feel
505 justified adding that complexity in the first draft.
506 There are already enough things here that may never see real-world usage.