Hacking the Crypto.c Algorithm by Jitsu-Disk Object ------ This paper presents (briefly) how the Novell one-way hash function operates, and how the speed of routines may be increased. Contents : Credits Foreword The one-way hash About optimization Useless and Over-used routines Recursion vs. Iteration *Added : cipher/decipher algorithm of Remote NLM Credits ------- Credits go to : Simple Nomad, for being Simply Elite Itsme, for putting together the first Crypto routine Rich, for teaching me Novell in the first place! TheRuiner ('dreamer'), for remote NLM cipher alorithm Foreword -------- Sometimes I wonder why computers are drawing so many people to them, people that often have very different jobs and inclinations when it comes to leisure or intellectual orientation. Moreover, many of the people who teach computer-related stuff or simply design them, come from mathematical fields and believe that you just can't understand a thing if you can't solve an integral. To me this is all bullshit. Computers are built by dreamers and are incredible tools for expressing your artistic feelings; Computer science is like a sphere : you can start by exploring any part of that sphere, and by going deeper you will very soon reach the core-algorithms. You can then bounce around to any other facet of that sphere, your learning is not limited by any academic idea of how one should learn and what vocabulary should be used in this process. By comparison, math and physics are like a pyramidal : you must learn (painfully) each layer before moving on to the next one. Regarding the encryption scheme used by Novell, I want to say that I'm not in any way good at CryptoAnalysis and that the Crypto.c routine is being approached from an algorithmic standpoint only. Now lets get dirty... The one-way hash ---------------- All information provided below is assumed by regarding at the resulting crypto.c and only this. A one-way-hash function is a formula where you input a cleartext word or phrase and the output is gibberish with two interesting properties : There is no way to reverse the process (inputting the hash and obtaining the cleartext), and the odds that two different cleartexts will give the same hash is almost nonexistent. Therefore a hash is a safe way to store a password, the computer processes the password when entered using the function and compares it to the stored hash. The one-way-hash scheme we're interested in, here, processes the cleartext password in four steps. 0) On a NetWare server you can enter passwords up to 128 characters in length, however the process used to generate the password-hash takes up to 32 characters in input and gives 16 bytes as the output. Therefore, if a password is longer than 32 characters, it is divided in chunks of 32 and chunk[0] is XORed against chunk[1] the result is XORed against chunk[2] and so on until we've reached the end of the password. This process IS NOT implemented nor further discussed here. The reason is simple : no one holds enough computer power to brute force such long passwords (given the algorithm below). 1) The password string is repeated enough times to feed a 32-byte length string (crypt2data[0->31]) and between each sequence of the password in crypt2data[] a byte is inserted from a fixed table called crypTab[]. 2) Each byte in buf[32] is XORed with the unique IDentification number attached to the user who owns that password, giving crypt2data[32]. note : At end of step_2 the ciphered crypt2data[] is still reversible to give the cleartext by XORing it again with the ID 3) A special non-linear function lengthens crypt2data[32] to crypt2data[64] 4) Finally the Password hash is 16 byte length, assuming Z[] is crypt2data[], nybTab a fixed table of nibbles, we have : hash[0]= nybTab[Z[64]] OR nybTab[Z[65]<<4] hash[1]= nybTab[Z[66]] OR nybTab[Z[67]<<4] . . . hash[15]= nybTab[Z[94]] OR nybTab[Z[95]<<4] note : Z[65]->Z[95] are processed using the same formula as in step_3 In a minute we're gonna go deeper into those four steps, but 2 very ESSENTIAL points must be stated : * The hash is a 16byte word where each byte is built upon a crypt2data[] pair because the nybTab is made of bytes with all the 4left-bits set to 0: nybTab = {0x7, 0x8, 0x0, 0x8, 0x6 ...} nybTab[high] nybTab[low] high=Z[64+2*i] low=Z[64+2*i+1] hash[i]= |____4 left bits_______||____4 right bits______| ex : if i=0 and Z[64]=0 and Z[65]=4 then hash[i]=0x76 * If the non-linear function is reversed, the "nybTab" would render this, break-through almost useless: if you study its contents you'll see that a set of 16 bytes are repeated various times (about 9 to 20) in a 265 byte length table, so that if you tried to reverse the hash starting from this table, you obviously had to perform the test at most 32power16 (rough average) Therefore, if the user used a password longer than 9-11 characters (rough average) you'd be better off trying to reverse the hash (if you could)... Besides, if the inventors of the Algorithm wanted to introduce some "BackDoor" (statistical trend known only to themselves) the nybTab would be a very good spot... (Note : unless you've got the computing power of the NSA, trying to brute force or reverse a password longer than 8 is useless). Step_1) There isn't much more to say about it, just to note that the purpose of this lengthening thing is to provide enough "space" for the formula of step_3 to play with. Besides the introduction of bytes from a fixed table throws in a little more entropy, and finally the longer a password is the less time it is repeated in crypt2data. Step_2) You may by asking yourself "What if some people with computers holding enough RAM (or organizing their data in B-Tree) generated all the hashes possible so that they would only have to compute them once". Step_2 is the reason why this can't happen : XORing with the ID generates a unique hash to this user since ID's are unique. You have to bear in mind that ID's are fixed-length (4Bytes) but passwords have variable lengths therefore, if you analyze crypto.c you will notice a little formula to match the XOR between the correct bytes of crypt2data[] and the ID. ex : Say we initialized the brute force with a 5-character length password, AAAAA. On the first turn is AAAAA processed to feed the 32 first bytes of crypt2data (Z), on the second turn we try AAAAB, therefore we only need to process the last byte of the password "B" in every occurrence of Z : { Byte introduced form crypTab (cTb[5], cTb[11]...) { AAAA XORed on turn_1 and also XORed on turn_1 { ___|___ | _1_{ |_|_|_|_|B|x|_|_|_|_|B|x|_|_|_|_|B|x|_|_|_|_|B|x|_|_|_|_|B|x|_|_| BYT{ 0 1.....4...........10..........16..........22..........28....31 { { |_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _|_ _ _ _| BYT{4ID_Bytes 0 2 0 2 0 { | | _2_{ --> Z[4]="B"XorID[0]................................Z[28]="B"XorID[0] Step_3) Here comes the fun, the non-linear formula is used to generate a non- reversible hash. This formula must have a 'good' property : it must be strong against CryptoAnalysis (i.e. you can't use some kind of statistical trend against the hash to sufficiently narrow the search upon a limited key-space). From the analysis of encryptp here's what comes out : We have two series U and Z, where U[]=Crypt_2_c_val[] (64 byte lengh) and Z[]=Crypt2data[] (96 byte lengh) We have one table of fix data D[]=CrypTab[] (32 byte lengh) U is defined by U[i]=U[i-1]+Z[i+32] Z is defined by { let i'=i&0x1f { let k =(U[i-33]+i')&0x1f { { if k31] in step_2 So this boils down to (understanding i&0x1f <=> i modulo 32) : { let i be the current iteration, i=0..63 { { { Z[i+32]=( Z(U[i]&0x1f+i) - D[i&0x1f] ) XOR ( Z[i] + U[i] ) { { { { U[i+1]=U[i]+Z[i+32] { { with U[0] = Z[32] = (Z[0]-D[0]) XOR Z[0] { { { Note : We don't need the "memory" provided by the use of a table for the { series U, therefore we could replace U[] by a variable U, and { U[i+1]=U[i]+Z[i+32] by U += Z[i+32] A few things have to be said here, first we can see that Z[] depends on U[], and U[] depends on Z[]. Therefore it is not possible to come up with a formula that could give us U[i] (and therefore Z[i]) without calculating U[i-1] first. This idea is reinforced by the use of a XOR in the formula, since the XOR operation has no "good" arithmetical properties allowing any simplifications. Second, Z[] is not linear and therefore U[] isn't either, and from some simple tests performed, it doesn’t look like i' or k can be easily guessed. Finally, the formula relies on a fixed table (CrypTab) to throw in a little more entropy, so two passwords right next to each other (say AAAA and AAAB) give a very different hash. It is to be said about this last point, that if the inventors of this algorithm wanted to introduce some kind of "BackDoor", the fixed table would certainly be another good spot, this would result in a thing called "collision" where two different passwords give the same hash. ** Added note : well, a collision exists in this algorithm !!! This was proven and an exploit coded by P.Semjanov. You can get the source code to this in the Novell section of : http://194.85.96.197/psw/crack.html. Yet they are better means to defeat the cryptic mechanism : we can use the hash directly, see the "PanMount" for Linux on Pandora's download page and details in the "Novell Cries Pandora" paper. Step_4) The aim of this last step is to shrink the 4Bits pairs of nybTab[Z[]] to give 8Bits words (so it ain't ugly like Unicode ASCII) and to make the word "one-way" meaningful... Note : It is obvious that there is no need to compare each Byte of the hash each turn, in fact, comparing the low 4-Bits of hash[0] with Z[64] proves to be a good test (i.e. rejected) 90% of the time. About Optimization ------------------ Although the aim of optimization is obvious : accelerating the execution of a program, the path to it is not always so obvious. First, let’s consider a few facts : in your computer the "slowest" device is your hard-drive, next (not considering the various add-on cards) comes your RAM, then Cache Memory and finally the processor. Therefore minimizing disk access is a top priority (the number of attempts the brute-force performs before saving things to disk was HIGHLY increased) (could someone spread the word to Micro$oft ?), and under certain conditions it is sometimes preferable to have the processor do multiple times the same simple operation than storing the result in a variable and calling the memory stack every time its needed. Second, we will assume you're using an Intel Pentium processor, or a RISC processor. One of the interesting facts about these is that almost every assembly instruction uses 1 computer cycle to execute (i.e. if you have a 100Mhz processor, you’re computing at most 100 million instructions per seconds (not considering pipeline)). So the question when you're programming in something else than assembly isn't much "In what assembly instruction is the compiler gonna transform this ?", but rather "How many assembly instructions is the compiler gonna come up with for this ?". Therefore, if you want your program to run fast, 1) Use a low-level programming language (i.e. close to the actual assembly code, like the C language) 2) Always turn the compiler optimize options ON for better results. Finally, maybe the most important of all is how your program is structured. I always try to keep two questions in mind, where the first one prevails on the second (because the answers might go in opposite directions): 1) "Does the program perform only the minimum needed to reach our goal" 2) "Is this the simplest (or easily read) code possible" One good example of how these two questions can have opposite answers is found in the XorwithID() function. The function itself is pretty clear but the trouble is that it performs the XOR thing on all the 32 first Bytes of Crypt2Data each turn, whereas only a few bytes actually changes each turn (see the diagram of step_2). So the answer to 1) is "No" because you'd save time only by changing the bytes that need it, but the answer to 2) is "Yes" because obviously the XorwithID function doesn't go into the complications of calculating the position of the bytes that really need to change. Another example that answer 1) is the way that the testing of the one-way hash was approached : previously all bytes were processed and tested against the original hash; in the modified version each brute-forced byte is processed independently and you only move to the next if the current one matches its original hash counter part. This went even further because now the first byte of the hash is first tested for its 4-last bits (see step_4 for details). One last thing : during the optimization process, it is wise to start with optimizing the "deepest" functions, that is to say the ones that are repeated the most frequently and to end up with the ones that are processed scarcely. Useless and Over-used routines ------------------------------ Sometimes during the development process of a program, you come up with plenty of small routines that you'd thought be more important than they actually are. These sub-routines should be included directly in the ones that call them as long as the clarity of the code is preserved and this doesn't cause the code to be too redundant. Redundancy is ugly, yes, but it gives faster code because calling a sub-routine is always time consuming, whatever the size of the sub-routine, and this increases with the number of local variable in it, and worsen if it’s a very "deep" sub-routine. Moreover, these sub-routines prevent from easily seeing two things : two or more sub-routines could be merged and a sub-routine might even prove useless. For instance Shrinkbuf() and Crypt2() were merged within the "for" loop of encryptp(), preparepw() and XorwithID() where modified and moved to the main() (so that they were on a less "inner" level) and finally fowardcrypt(), InitRevcalc() and revcrypt2() were simply removed. Recursion vs. Iteration ---------------------- Recursion happens when a function is defined by itself. Ex : function Get_Drunk(where_to_go) {If Alcohol_Test is positive return home; Drink(Same_as(the_guy_next_to_you)); If (the_guy_next_to_you=drunk) then Get_Drunk(another_bar) else Get_Drunk(same_place) } A recursive function usually needs a "seed" to start with (Get_Drunk(bar)), but depending on your goal you will be looking for either a solution to a problem (forward recursion, i.e. : how can I get drunk) or the problem which a specific solution solves (reverse recursion, i.e. : what steps it took me to get drunk before I got home). If either your problem or your solution is unknown to you, and you came up with a recursive algorithm to solve it, it may just be that there just isn't any other way to do it. If there is another way that means using an Iterative process where you can clearly define a beginning and an end to it. One of the particularities of a recursive algorithm, is that usually you can't say WHEN it will stop and worse IF it will stop. Moreover, recursive means a function call each turn, and there's nothing more CPU consuming than a function call. The good news is that if you know the recursive algorithm always comes up with a solution and in a determinable or even better fixed number of turns, then you're assured there is an Iterative equivalent to your Recursion. Some will argue that you can always turn a recursive function in an iterative loop by stack saving the variables each step in the loop, but this doesn’t give good optimization. Regarding the crypto.c, we had a two-level recursive call between calc_c and calcElement, but some simples tests proved that they always gave their solution in a fixed number of turns. Therefore the first thing was to track down the "seed", by looking at how the algorithm gave U[0] (=Z[32]), and then crack down one after the other the two functions to their Iterative equivalent. Cipher/Decipher Algorithm of Remote NLM, by TheRuiner ('dreamer') ----------------------------------------------------------------- 'dreamer' cryptographically cracked Novell's Remote.NLM password encryption algorithm. It is a very weak algorithm compared to what Novell has implemented in NDS, as it is instantaneously decryptable. RConsole password encryption is different from Remote.NLM password encryption because: 1) Encrypted RConsole passwords are sent across the wire, via RConsole. Remote.NLM's encrypted passwords are generated at the server console by typing REMOTE ENCRYPT MyPass, and they are optionally stored in SYS:System\LDRemote.NCF. 2) They use a different password encryption algorithm. RConsole passwords are encrypted with information from the workstation that is currently running RConsole. Remote.NLM passwords are encrypted with a time byte, one of 255 constants in a hash table, appended characters, some XORing, and bit-order separation. 3) Encrypted RConsole passwords are locally obtained with a packet sniffer, but Remote.NLM passwords are remotely accessible to anyone with the ability to view SYS:System\LDRemote.NCF. The Remote.NLM passwords are decrypted using only five steps. To encrypt, simply reverse the steps. The password will look something like this: AF8CBBF48CA9955F5ADAFDADAA23 The structure of the password is as follows: AF8CBBF48CA99 55F5ADAFDADAA - 23 The first section contains the low-order bits, and the second, the high-order bits. 23 is the time byte, which is decremented by the server once every two seconds, from FF to 02, then back up to FF, etc. Step 1) Realign the low-order bits and high-order bits ======================================================= This is extremely simple to do. The high-order bits are in order from the first character to the last, and so are the low-order bits. Example: Password: AF8CBBF48CA99 - 55F5ADAFDADAA, Output: 5A 5F F8 5C AB DB AF F4 D8 AC DA A9 A9 At this point, ignore 5A 5F F8 5C, or the first four bytes. They are appended somewhere during encryption, and decrypt to "%*@$". It was a TERRIBLE idea for Novell to implement those four characters into every single password, as those are what helps rebuild their hash table from scratch. Also, if the length of the password is 10, the password is automatically decryptable to nul. Step 2) ======= Match each of the password characters (group of two hex characters) to the hash table below. Use their position from the beginning of the table to determine the value of the pre-hash encrypted password. Example: F4, the 8th character of the password, matches the hash table at 95. This means that 95 is the pre-hash value of F4. Thus far, (ignoring the first four characters) the password was: AB DB AF F4 D8 AC DA A9 A9 and now the password is: 98 A0 9B 95 A1 9D A6 9C 9C Remote.NLM Hash Table 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 00 5B 58 5E 5F 59 5C 5A 5D-73 70 76 77 71 74 72 75 10 13 10 16 17 11 14 12 15-7B 78 7E 7F 79 7C 7A 7D 20 53 50 56 57 51 54 52 55-03 00 06 07 01 04 02 05 30 1B 18 1E 1F 19 1C 1A 1D-0B 08 0E 0F 09 0C 0A 0D 40 2B 28 2E 2F 29 2C 2A 2D-63 60 66 67 61 64 62 65 50 83 80 86 87 81 84 82 85-3B 38 3E 3F 39 3C 3A 3D 60 8B 88 8E 8F 89 8C 8A 8D-33 30 36 37 31 34 32 35 70 93 90 96 97 91 94 92 95-6B 68 6E 6F 69 6C 6A 6D 80 9B 98 9E 9F 99 9C 9A 9D-A3 A0 A6 A7 A1 A4 A2 A5 90 F3 F0 F6 F7 F1 F4 F2 F5-AB A8 AE AF A9 AC AA AD A0 DB D8 DE DF D9 DC DA DD-FB F8 FE FF F9 FC FA FD B0 23 20 26 27 21 24 22 25-B3 B0 B6 B7 B1 B4 B2 B5 C0 CB C8 CE CF C9 CC CA CD-BB B8 BE BF B9 BC BA BD D0 C3 C0 C6 C7 C1 C4 C2 C5-D3 D0 D6 D7 D1 D4 D2 D5 E0 43 40 46 47 41 44 42 45-E3 E0 E6 E7 E1 E4 E2 E5 F0 4B 48 4E 4F 49 4C 4A 4D-EB E8 EE EF E9 EC EA ED Step 3) ======= Subtract the length (the number of groups of hex characters, excluding the time character) of the full password from each encrypted password character. Now you have the ACTUAL pre-hash encrypted password. If the subtracted value is negative then simply continue from FF down to the negative value. Example: if the password character is at 04, and the length is 6, the value of the password character will be FF. The length is 13 (D in hex), so the password was: 98 A0 9B 95 A1 9D A6 9C 9C and is now: 8B 93 8E 88 94 90 99 8F 8F Step 4) ======= Get the time var, in this situation 23 (hex), and subtract it from FF. This new character is for use in Step 5. Example: FF-23=DC. Step 5) ======= Finally, XOR each character (group of 2 hex characters) of the encrypted password with the new time character, and you now have the decrypted password! The password was: 8B 93 8E 88 94 90 99 8F 8F (before the XOR) Now, the decrypted password is: 57 4F 52 54 48 4C 45 53 53 "WORTHLESS" The code to this algorithm was added to the Pandora API in Pandora rel.4 ______________________________________________________________________________ I hope all the crap pulled here proves some kind of interest (and criticism). The result of all this is a speed increase of about 76 times compared to the first release of the project. Have Fun. Jitsu-Disk (jitsu@nmrc.org).