doc/cache

   1 Cistron Password Caching
   2
   3 Acknowledgements
   4
   5 Thanks to Alan DeKok for the initial idea, to Dr. Bob Pilgrim for
   6 implementation strategies, and to Miquel for putting it into cistron.
   7
   8 What does this caching do?
   9
  10 It will add the capability to cache several of the system files that
  11 are used often by cistron radius. These include: /etc/passwd, /etc/shadow
  12 (if necessary), and /etc/group.
  13
  14 By caching the information in these files and storing it in a VERY efficient
  15 hash table, authentication lookups are sped up many times over. So, the
  16 short answer is that your authentication requests will be faster if you use
  17 caching, assuming you're using System authenication now. Read on for the
  18 long answer.
  19
  20 How do I use it?
  21
  22 Compile radiusd as you normally would, and then set 'cache=yes' in your
  23 radiusd.conf file.
  24
  25 Then, if you want to monitor how the caching is doing (errors and such),
  26 'grep HASH /var/log/radius.log'.
  27
  28 What are the advantages?
  29
  30 Tremendous speed increase in authentication request handling. Orders of
  31 magnitude, literally.
  32
  33 What are the disadvantages?
  34
  35 As with anything else, there are a couple of trade-offs. However, these are
  36 quite modest, IMHO.
  37
  38 First, memory consumption of your radiusd process will go up somewhat over
  39 normal operation. Depending on the size of your files, it could take several
  40 megabytes of RAM to build the hash table properly. However, I have gone to
  41 great lengths to ensure that there are no memory leaks. The size of the
  42 daemon process should stay constant, or grow linearly with the size of your
  43 system files. If you notice otherwise, send me an email with lots of detail
  44 about your system and I'll see what I can find.
  45
  46 Second, you should kill -HUP the main radiusd process every so often out of
  47 cron. The reason for this is because the daemon will need to rebuild the
  48 cache to get passwords that you have changed for existing users. You will
  49 not have to -HUP it for new users in /etc/passwd to get authenticated, as if
  50 it can't find the user in the hash, it will automatically revert to the old
  51 (slow) lookup methods [getpwnam()]. However, if you change the password for
  52 a user, and the hash table contains that user but with the old password, it
  53 will simply think that the user is supplying the wrong password and deny
  54 them access. One thing I'd like to point out is that rebuilding the hash
  55 with the -HUP takes less than a second with our ~1MB /etc/passwd and shadow
  56 files. The slowest part of the -HUP is caused by the gethostby*() functions.
  57 All in all, a -HUP reconfig is virtually instant on our solaris boxes, and
  58 takes about 15 seconds on standard linux 2.x machines.
  59
  60 Why was it written?
  61
  62 In examining our radius servers, I noticed that there were several things
  63 slowing down an authentication requests. Please note that some of these
  64 things will depend on what you are doing in your raddb/users file, however,
  65 most of the things we were doing were quite common, so I feel that others
  66 may have similar setups. In any case, I'll go thru exactly what we were
  67 seeing, and how caching fixes it.
  68
  69 Authentication Bottlenecks:
  70 *Group check items calling getgrnam()
  71 *DEFAULT users with System authentication calling getpwnam()
  72 *Simultaneous use check reading file /var/log/radutmp
  73
  74 First, the auth request comes in, and radius (usually) forks a child to
  75 handle it. The child will then check the raddb/users file to see if the user
  76 is explicitly listed there. In our case, most the users were matched off of
  77 a DEFAULT system entry, meaning they would be looked up in the /etc/passwd
  78 and /etc/shadow files. However, first, we had to check for a couple of
  79 groups that aren't allowed access. In our case we had two DEFAULT group
  80 compare check items, meaning that the getgrnam() system call was called
  81 twice for every authentication request. Considering that the /etc/group file
  82 is not so large, that wasn't a huge factor. However, the /etc/group file is
  83 stored on disk, and was accessed through system calls, meaning it had to be
  84 opened and searched until a matching group was found (twice), slowing down
  85 authentication requestions somewhat.
  86
  87 Ok, so we've established that there's a getgrnam() call for each DEFAULT
  88 user you have with a group check item. If you don't have any such DEFAULTs,
  89 no problem, you don't have those calls. However, nearly 99% of you probably
  90 have a line like the following in your raddb/users file, leading into the
  91 next bottleneck: DEFAULT Auth-Type = System, Simultaneous-Use = 1
  92
  93 What does that mean? It means, for every auth request that comes in for
  94 which we don't have an explicit match in our raddb/users file, lookup the
  95 user in /etc/passwd and/or /etc/shadow using getpwnam() and getspwnam()
  96 system calls. So what's so bad about that? It works right? Well, yes, but
  97 it's inefficient because of the way getpwnam() works.
  98
  99 Think for a second about what getpwnam() has to do. It is a linear search
 100 algorithm through an unsorted list stored on disk. It is an O(N) algorithm,
 101 for you computer science folks (barring any caching build into the function
 102 by particular OSs). So, basically, what it does is open the /etc/passwd
 103 file, start at line one, compare that user to the user we're looking for,
 104 and if there's no match, go to the next line and repeat. That means that on
 105 average, every auth request you're taking requires opening and reading half
 106 of your /etc/passwd file. Worst case is when a user not in the file is
 107 looked up, as you search the entire file without a match. Best case is when
 108 the user is the first one in the file, and average is in the middle
 109 somewhere.
 110
 111 Are you saying, that doesn't sound so bad? Well, then perhaps caching
 112 isn't for you. In our case, we had a ~15,000 line passwd file, and anywhere
 113 from 20-40,000 logins daily. The getpwnam() call was taking by far the
 114 majority of the radius daemon's time, resulting in fairly poor
 115 authentication performance.
 116
 117 NOTE:  radutmp caching is disabled for now until we can find a better way
 118 to do it.  Still, this next paragraph indicates why it needs doing.
 119
 120 Lastly, if you're using the wonderful simultaneous-use check item in cistron
 121 radius, you have another disk lookup to perform on your /var/log/radutmp
 122 file (actually, caching of this file just came available in b17. glad
 123 somebody else noticed it :) What happens is, if the user is authenticated
 124 properly, the daemon then opens the radutmp file, reads until it hits a line
 125 matching the user currently requesting authentication, and then either
 126 allows or denies access based on the presense or absense of a match. Thus,
 127 in the majority of cases, you read the entire file in for every
 128 authentication request (assuming that most of your users aren't trying to
 129 steal service from you :) .
 130
 131 So, we've identified the problems. If you really care about how caching
 132 addresses those problems, keep reading.
 133
 134 How does it work?
 135
 136 The code will first open your /etc/passwd file and runs through it hashing
 137 all the usernames into a numeric index into my hash table. It then creates
 138 an entry into the hashtable at that index for every user in the file.
 139 Currently, it caches username, password, gid, and uid in the entry. Then, if
 140 you have shadow passwords, it will open your /etc/shadow file and put the
 141 passwords into the entry corresponding with the appropriate user. It is this
 142 hash table that allows us to eliminate the getpwnam() bottleneck. Read below
 143 for how to test it.
 144
 145 Next, it will read the /etc/group file and build a singly linked list of it.
 146 It's not as fast as the hash, but the group file is usually small enough
 147 that it wouldn't make a difference. Just brining it into RAM should be good
 148 enough. This takes care of the getgrnam() bottleneck.
 149
 150 NOTE:  Again, radutmp caching is disabled.  Disregard this next paragraph.
 151
 152 Finally, it reads the /var/log/radutmp file and marks every user listed as
 153 logged in with a login variable stored in their hash entry. Then, every time
 154 a user logs in or out, that variable is updated so that we can avoid reading
 155 in /var/log/radutmp.
 156
 157 So what, right? Well, understand that for every user lookup done on a
 158 system, you have currently to read half of your passwd file. Thus, the
 159 number of comparisons done per getpwnam() is the number of lines in your
 160 passwd file divided by two. So do 'wc -l /etc/passwd', divide that by two,
 161 and that's the number of comparisons done per user lookup on your system
 162 right now.
 163
 164 Using the hash table method, I have achieved user lookups in RAM in near
 165 constant time. That is, using my passwd files (everyone's is different,
 166 obviously), I could find a user in the hash in 1.0673 comparisons (on
 167 average). Being that 1.0 is the lowest number of comparisons possible,
 168 that's a pretty significant thing. Again, for the computer science people,
 169 the hash function and table are wide enough such that 87.4% of users are
 170 stored singly in a bucket, 11.7% are stored with two users in a bucket, 0.8%
 171 are stored with 3 users in the bucket, and 0.03% are stored with more than 3
 172 users per bucket. That means that 87% of your lookups are done in one
 173 comparison, versus the number you got when you did the 'wc -l /etc/passwd' /
 174 2 above. This discussion has also neglected the fact that RAM searches are
 175 many, many times faster than disk searches. We'll let it speak for itself.
 176
 177 But, you don't have to take my word for it. I have a test program that I've
 178 written that can show you the difference on your own system between the
 179 speed in lookups using the hash and getpwnam().  You can find the program
 180 cachetest.c at this end of this file.  Simply save it to cachetest.c and
 181 and compile it with:
 182
 183 gcc -o cachetest ./cachetest.c
 184
 185 What it does is read in your passwd file, build a hash of it, and then
 186 perform the number of lookups you specify either with getpwnam() or using
 187 the hash lookup functions.
 188
 189 Before performing tests, please note that the larger the passwd file, the
 190 larger the disparity between the two test methods. So you'll get a more
 191 accurate depiction of what the code will do for your system by running it
 192 on a passwd file on your radius server.
 193
 194 I'd recommend doing the following tests with it:
 195
 196 # Perform 1,000 lookups using getpwnam()
 197 ./cachetest 1000 -s
 198
 199 # Perform 1,000 lookups using the hash
 200 ./cachetest 1000
 201
 202 My guess, no real significant difference in the time taken to complete each
 203 of those. Next test.
 204
 205 # Perform 10,000 lookups using getpwnam()
 206 ./cachetest 10000 -s
 207
 208 # Perform 10,000 lookups using the hash
 209 ./cachetest 10000
 210
 211 Unless you've got a real fast machine, you should have noticed a little bit
 212 of difference between the two that time. Next test
 213
 214 # Perform 50,000 lookups using getpwnam()
 215 ./cachetest 50000 -s
 216
 217 # Perform 50,000 lookups using the hash
 218 ./cachetest 50000
 219
 220 Unless you have a cray, I'm real confident you noticed a difference that
 221 time. If the 50,000 user lookup didn't extraordinarily long on your system,
 222 try this last test.
 223
 224 # Perform 100,000 lookups using getpwnam()
 225 ./cachetest 100000 -s
 226
 227 # Perform 100,000 lookups using the hash
 228 ./cachetest 100000
 229
 230 No doubt, you saw the difference here. Unless you want to be old and gray
 231 when you finish the getpwnam() tests, I'd recommend not going higher than
 232 100,000. Still, it's up to you. Go as high as you want on the hash test.
 233 1,000,000 hash lookups are done in about a second on my pentium 400 running
 234 linux.
 235
 236 How will it help my radius server?
 237
 238 Well, assuming you can live with the disadvantages listed above, it'll make
 239 it faster. If you haven't run the tests in the above section, I'd recommend
 240 doing so.
 241
 242 Is it stable?
 243
 244 As far as I can tell, yes. We've run it on both linux and solaris maches for
 245 about two weeks now, and had zero problems. If you notice a bug, by all
 246 means, let me know and I'll see if I can get a fix out. However, if you
 247 notice system degrading problems with it, you can always turn it off in
 248 your radiusd.conf file.
 249
 250 Ok, that's it. If you made it this far, I'm impressed. Let me know what you
 251 think...
 252
 253 jeff@apex.net
 254
 255 --------------------------------------------------------------------------
 256
 257 /*
 258  * cachetest.c:  Tests performance difference between user lookups in
 259  * the hash table vs. getpwnam()
 260  *
 261  * All users in the passwd/shadow files are stored in a hash table.
 262  * the hash lookup is VERY fast,  generally 1.0673 comparisons per
 263  * lookup.  For the unitiated, that's blazing.  You can't have less
 264  * than one comparison, for example.
 265  *
 266  *      (c) 1999 Author - Jeff Carneal, Apex Internet Services, Inc.
 267  */
 268 /*
 269         COMPILE:  gcc -o cachetest cachetest.c
 270 */
 271 #include<stdio.h>
 272 #include<fcntl.h>
 273 #include<grp.h>
 274 #include<unistd.h>
 275 #include<stdlib.h>
 276 #include<errno.h>
 277 #include<ctype.h>
 278 #include<sys/stat.h>
 279 #include<sys/types.h>
 280 #include<string.h>
 281
 282 /* Misc definitions */
 283 #define BUFSIZE  1024
 284 #define MAXUSERNAME 20
 285 #define HASHTABLESIZE 100000
 286 #define PASSWDFILE "/etc/passwd"
 287
 288 /* Structure definitions */
 289 struct mypasswd {
 290         char    *pw_name;       /* user name */
 291         char    *pw_passwd;     /* user password */
 292         uid_t   pw_uid;         /* user id */
 293         gid_t   pw_gid;         /* group id */
 294         int     loggedin;       /* number of logins */
 295         struct mypasswd *next;  /* next */
 296 };
 297
 298 /* Function prototypes */
 299 int buildHashTable(void);
 300 struct mypasswd *findHashUser(const char *user);
 301 int storeHashUser(struct mypasswd *new, int index);
 302 int hashUserName(const char *s);
 303 int doHashTest(int numusers, unsigned long numlookups);
 304 int usage(void);
 305
 306 /* Make the tables global since so many functions rely on them */
 307 static struct mypasswd *hashtable[HASHTABLESIZE];
 308 static struct mygroup *grphead = NULL;
 309
 310 char allusers[HASHTABLESIZE][MAXUSERNAME];
 311 int usegetpwnam=0;
 312
 313 int main(int argc, char** argv) {
 314         int numusers=0;
 315         unsigned long numlookups=0;
 316
 317         if(argc<2) {
 318                 usage();
 319         }
 320
 321         if(isdigit(argv[1][0])) {
 322                 numlookups = atoi(argv[1]);
 323         } else {
 324                 usage();
 325         }
 326         if((argv[2] != NULL) && !strncmp(argv[2],"-s",2)) {
 327                 usegetpwnam = 1;
 328         }
 329
 330         memset((char *)allusers, 0, MAXUSERNAME*HASHTABLESIZE);
 331
 332         numusers = buildHashTable();
 333         if(numusers < 0) {
 334                 printf("HASH:  Error building hash table!  Quitting...\n");
 335                 exit(1);
 336         }
 337
 338         doHashTest(numusers, numlookups);
 339         printf("Done!\n\n");
 340
 341 }
 342
 343 /* Builds the hash table up by storing passwd/shadow fields
 344  * in memory.  Returns -1 on failure, 0 on success.
 345  */
 346 int buildHashTable(void) {
 347         FILE *passwd, *shadow;
 348         char buffer[BUFSIZE];
 349         char idtmp[10];
 350         char username[MAXUSERNAME];
 351         char *ptr, *bufptr;
 352         int len, hashindex, numread=0;
 353         struct mypasswd *new, *cur, *next;
 354
 355         memset((char *)username, 0, MAXUSERNAME);
 356
 357         /* Init hash array */
 358         memset((struct mypasswd *)hashtable, 0, (HASHTABLESIZE*(sizeof(struct mypasswd *))));
 359
 360         if( (passwd = fopen(PASSWDFILE, "r")) == NULL) {
 361                 printf("HASH:  Can't open file %s:  %s", PASSWDFILE, strerror(errno));
 362                 return -1;
 363         } else {
 364                 while(fgets(buffer, BUFSIZE , passwd) != (char *)NULL) {
 365                         numread++;
 366
 367                         bufptr = buffer;
 368                         /* Get usernames from password file */
 369                         for(ptr = bufptr; *ptr!=':'; ptr++);
 370                         len = ptr - bufptr;
 371                         if((len+1) > MAXUSERNAME) {
 372                                 printf("HASH:  Username too long in line: %s", buffer);
 373                         }
 374                         strncpy(username, buffer, len);
 375                         username[len] = '\0';
 376                         if(numread < HASHTABLESIZE) {
 377                                 strncpy(allusers[numread-1], username, MAXUSERNAME);
 378                         }
 379
 380                         /* Hash the username */
 381                         hashindex = hashUserName(username);
 382                         /*printf("%s:%d\n", username, hashindex);*/
 383
 384                         /* Allocate space for structure to go in hashtable */
 385                         if((new = (struct mypasswd *)malloc(sizeof(struct mypasswd))) == NULL) {
 386                                 printf("HASH:  Out of memory!");
 387                                 return -1;
 388                         }
 389                         memset((struct mypasswd *)new, 0, sizeof(struct mypasswd));
 390
 391                         /* Put username into new structure */
 392                         if((new->pw_name = (char *)malloc(strlen(username)+1)) == NULL) {
 393                                 printf("HASH:  Out of memory!");
 394                                 return -1;
 395                         }
 396                         strncpy(new->pw_name, username, strlen(username)+1);
 397
 398                         /* Put passwords into array, if not shadowed */
 399                         /* Get passwords from password file (shadow comes later) */
 400                         ptr++;
 401                         bufptr = ptr;
 402                         while(*ptr!=':')
 403                                 ptr++;
 404
 405                         #if defined(NOSHADOW)
 406                         /* Put passwords into new structure (*/
 407                         len = ptr - bufptr;
 408                         if((new->pw_passwd = (char *)malloc(len+1)) == NULL) {
 409                                 printf("HASH:  Out of memory!");
 410                                 return -1;
 411                         }
 412                         strncpy(new->pw_passwd, bufptr, len);
 413                         new->pw_passwd[len] = '\0';
 414                         #endif /* NOSHADOW */
 415
 416                         /*
 417                     * Put uid into structure.  Not sure why, but
 418                          * at least we'll have it later if we need it
 419                          */
 420                         ptr++;
 421                         bufptr = ptr;
 422                         while(*ptr!=':')
 423                                 ptr++;
 424                         len = ptr - bufptr;
 425                         strncpy(idtmp, bufptr, len);
 426                         idtmp[len] = '\0';
 427                         new->pw_uid = (uid_t)atoi(idtmp);
 428
 429                         /*
 430                     * Put gid into structure.
 431                          */
 432                         ptr++;
 433                         bufptr = ptr;
 434                         while(*ptr!=':')
 435                                 ptr++;
 436                         len = ptr - bufptr;
 437                         strncpy(idtmp, bufptr, len);
 438                         idtmp[len] = '\0';
 439                         new->pw_gid = (gid_t)atoi(idtmp);
 440
 441                         /*
 442                          * We're skipping name, home dir, and shell
 443                          * as I can't think of any use for storing them
 444                          */
 445
 446                         /*printf("User:  %s, UID:  %d, GID:  %d\n", new->pw_name, new->pw_uid, new->pw_gid);*/
 447                         /* Store user in the hash */
 448                         storeHashUser(new, hashindex);
 449                 }       /* End while(fgets(buffer, BUFSIZE , passwd) != (char *)NULL) { */
 450         } /* End if */
 451         fclose(passwd);
 452
 453         return numread;
 454 }
 455
 456 /*
 457  * Looks up user in hashtable.  If user can't be found, returns 0.
 458  * Otherwise returns a pointer to the structure for the user
 459  */
 460 struct mypasswd *findHashUser(const char *user) {
 461
 462         struct mypasswd *cur;
 463         int index;
 464
 465         /* first hash the username and get the index into the hashtable */
 466         index = hashUserName(user);
 467
 468         cur = hashtable[index];
 469
 470         while((cur != NULL) && (strcmp(cur->pw_name, user))) {
 471                 cur = cur->next;
 472         }
 473
 474         if(cur) {
 475                 return cur;
 476         }
 477
 478         return (struct mypasswd *)0;
 479
 480 }
 481
 482 /* Stores the username sent into the hashtable */
 483 int storeHashUser(struct mypasswd *new, int index) {
 484
 485         /* store new record at beginning of list */
 486         new->next = hashtable[index];
 487         hashtable[index] = new;
 488
 489         return 1;
 490 }
 491
 492 /* Hashes the username sent to it and returns index into hashtable */
 493 int hashUserName(const char *s) {
 494      unsigned long hash = 0;
 495
 496      while (*s != '\0') {
 497          hash = hash * 7907 + (unsigned char)*s++;
 498                 }
 499
 500      return (hash % HASHTABLESIZE);
 501 }
 502
 503 int doHashTest(int numusers, unsigned long numlookups) {
 504         int i, numlookedup = 0;
 505
 506         /* Use pokey getpwnam() syscall to find users */
 507         if(usegetpwnam) {
 508                 printf("\nUsing getpwnam() lookup for %d user lookups over %d users found\n\n", numlookups, numusers);
 509                 while(numlookedup < numlookups) {
 510                         for(i=0; i<numusers; i++) {
 511                                 if(numlookedup >= numlookups)
 512                                         return 0;
 513
 514                                 if(allusers[i] != NULL)  {
 515                                         getpwnam(allusers[i]);
 516                                         numlookedup++;
 517                                 }
 518                         }
 519                 }
 520
 521         /* Use blazing fast hash method to find user */
 522         } else {
 523                 printf("\nUsing hash lookup for %d user lookups over %d users found\n\n", numlookups, numusers);
 524                 while(numlookedup < numlookups) {
 525                         for(i=0; i<numusers; i++) {
 526                                 if(numlookedup >= numlookups)
 527                                         return 0;
 528
 529                                 if(allusers[i] != NULL)  {
 530                                         findHashUser(allusers[i]);
 531                                         numlookedup++;
 532                                 }
 533                         }
 534                 }
 535         }
 536         return 0;
 537 }
 538
 539 int usage() {
 540         printf("\nUsage:  ./cachetest #lookups [-s]\n");
 541         printf("\n\tPerform 50,000 lookups using the hash method");
 542         printf("\n\tEx:  ./cachetest 50000\n");
 543         printf("\n\tPerform 50,000 lookups using the getpwnam() syscall");
 544         printf("\n\tEx:  ./cachetest 50000 -s\n\n");
 545         exit(1);
 546
 547 }