Wednesday, August 21, 2024

Korelogic's CMIYC 2024 @ DEF CON 32 Write-up

Members:

AMD, blazer, gearjunkie, golem445, hops, meowmeowbean, s3inlc, waffle, winxp5421 

usasoft (infra)

Introduction

A huge thanks to KoreLogic for once again hosting the annual premier password cracking contest. This year we finally saw the appearance of Argon2 hashes, along with other weird and wonderful algorithms such as saph512, RC2, gocrypt & broken SHA1s. Congratulations to HashMob for topping the scoreboard with 4.89 billion points coming first, with hashcat coming at their heels at 4.83 billion points, we came in at third with 2.65 billion points. It was refreshing to see a wild new team_vodka, which almost came third. Congratulations to the street team “ThatOnePasswordWas40Passwords” which topped the street scoreboard by having almost 7x the number of points compared to the second team. This year, there were a multitude of challenge files in addition to an initial drop of hashes to keep us on our toes throughout the whole 48-hour duration.

Hashes

SM3crypt/ saph512 / Azure

SM3crypt / saph512 modules were written very early into hashcat allowing us to compute these all on GPU. Some time was wasted re-implementing the Azure hashes, due to the format differences it took us some time to realize that the format was already created. Due to us having GPU implementations so early, we were able to crack the highest number of SM3crypt and saph512 compared to the other teams.

Broken SHA1

This was chopped up into smaller parts for matching in mdxfind which supports variable hash lengths and later re-assembled. The point to work ratio for these did not appear to be worthwhile, so these were deemed low priority.

Argon2

We found that if we removed the shiro prefix along with flipping the m and t variables, we were able to load these into john the ripper GPU to crack. It is also important to note that cyclone’s argon2 cracker works with these as well. A small but important gotcha is that cyclone’s cracker may not report the crack on some systems.We are unable to replicate this issue post contest (possible PEBKAK, apologies cyclone).

Radmin3

A bash script looping radmin3_to_hashcat.pl was created in a loop to parse the registry file into crack-able lines. We noticed this produced several faulty lines (did not have time to investigate these).

Challenges

Challenge 1

ARJ, a strange and bizarre archive format that most of us weren’t aware of. We tackled these using various tools, from Elcomsoft’s archive recovery tool to an ancient cracker which required someone to boot up a 32-bit Windows XP system to run. Other than spotting a UT8-pass in the list and tracing through each note file leading onto the next ARJ we did not see use for this archive. We tried to pair up the usernames with those from other known hashes, but were not able to correlate these. We tried pairing up the passwords of the ARJ with the passwords in the notes and different orders and tested these against the bcrypts as well. It appeared we wasted a considerable amount of time with these that didn’t really lead us anywhere.

Gocryptfs

A gocryptfs cracker was developed. However, we were not able to get any cracks for this challenge. We also had a member go down this rabbit hole only to later realize that the version used for CMIYC had patched the flaws found in the audit above.

Challenge 2

We were able to decrypt the pcap file and crack the password. We were then able to use the password to decode the data packets and reassemble the various data streams. This allowed us to intercept various communications including basic http/ftp and email communications from the tcp/ip streams. The communications look like this after reassembling.

We were not able to make use of the decoded data.

Pybcryptk.py

We completely overlooked this file and did not make use of it to identify the underlying algorithm until quite late into the contest.

Challenge 3

While we were able to open the initial zip containing a multitude of more zips, we did not crack any.

Challenge 4

We created an RC2 cracker, but were not able to apply any hints or correlated data for this attack early on. We later made some solves using crcmatch.

Challenge 5

We were able to make use of the app.log success/failed attempts to assist us in making solves for various logins. Upon closer inspection of the log file, there was a CRC checksum for the successful logins. Through trial and error, we were able to identify the correct CRC algorithm used (CRC8). This hint provided also gave us the partial mask to a subset of hashes and looked like the image below.

Knowing the CRC8 checksum meant that we could now speed up the cracking for each of the logins from the log file by up to 256 times. Even with a mask and checksum meant that brute forcing the unknown portion would still take considerable time to cover a ?a?a?a?a mask for the slow bcrypt/argon algorithms.

A new tool was created called crcmatch, this tool would accept an input, use the half mask input as a secondary list and output the full plain if the CRC matched. It was at this time, roughly 2 hours before the end of the contest that the obscure bcrypt $2b$ algorithm was identified via a go implementation (bcrypt base64(hmac sha256 (salt)). This was quickly reversed engineered and implemented into mdxfind. With around 1.5 left we were able to perform a “likely brute force” of the suffix. This involved extracting and or generating a set of mask suffixes which we believed would have a high probability of being successful. Examples would include 19?d?d?s, 202?d?s and ?l?l?l?l?s. We were able to pipe the output of crcmatch | mdxfind which allowed us to solve the highest number of bcr256 hashes.

Right after the contest ended we also figured the algorithm for the other elusive bkr256 $2y$ hashes being bcrypt with truncated salt, password HMAC-SHA256 with the untruncated salt as the key.  The salt can have only 1 of 4 possible values. A build of mdxfind was created which cycles through all 4, rebuilds salt, and tries 4 different salts for each plaintext.

Patterns

We largely based our attacks using data extracted from various RFC resources namely RFC 1000, 1002, 1008, 1077 & 1096 as these appeared to produce consistent founds across almost all the hash lists.  Some samples for the Azure hash list shown below, we found that replacing a space character with a symbol for 2,3,4 word combinations yielded founds.

Similarly, the above pattern also worked on texts such as “Snowpiercer” and “Seven Samurai” which were the themes for this year’s DEF CON. While we noticed a “Star Trek” theme present, we did not spend much time pursuing this to identify the exact pattern. We largely used an internal version of PACK2 to generate n-grams from the various text excerpts listed above for our attacks.

What went wrong

While we had the compute resources, knowledge, developers and crackers present, we failed to see the solves through with the different challenges. It almost felt like we were inundated with challenges and new tasks, and before we could fully catch up, another challenge was dropped. We were unable to make meaningful correlations with the challenges which we had solved. We wasted too much time building tooling to solve new challenges, while at the same time trying to piece together the solved challenges. We did not extensively test the zip files and pay attention to warning messages, which hindered our ability to leverage the hints they provided.

Closing remarks

Like the famous Michelangelo once said "Ancora Imparo". Despite having competed for many years, our team is still learning something new every contest. While some of the challenges made things quite frustrating, it was overall a fun experience. If you are a like minded individual and want to contribute to our team, reach out.
Join_US | Contact_US @CynoPrime


Tuesday, August 15, 2023

Korelogic's CMIYC 2023 @ DEF CON 31 Write-up

 Members that participated (10 crackers/ 1 support)

  • s3in!c

  • golem445

  • hops

  • blazer

  • gearjunkie

  • winxp5421

  • AMD

  • cvsi

  • pdo

  • Waffle

  • Usasoft (support)

Peak computing power (25-30 standardized to 4090)

Before the contest

The test hashes gave us a gleam of what the potential hash data would look like.  After successfully cracking all the test hashes, we noticed very heavy use of UTF-8 encoded characters. We ensured we had adequate tooling to handle UTF-8 strings, detect character sets and expanded toolkits to leverage translation API for batch language translations. We also created tooling to parse the yaml data into usable and more manipulable formats.  

During the contest

Our very first issue we encountered was the {ssha512} hashes, since hashcat outputs these as {SSHA512} we had to quickly update our system to perform some translations to handle the alternate case.

We identified the timestamp pattern early on and those were used to quickly gain cracks on bcrypt. The metadata did throw us off a little bit initially as we were not totally sure how it was incorporated into the plaintext. Early on we were unsure whether they gave us a hint or whether the plaintext contained a portion or manipulated form of the metadata. Due to the insanely slow hashrate of bcrypt and sha512crypt sha256crypt it took our team quite some time to gather enough samples to deduce that the plaintext patterns were distributed across all the algorithms evenly. 

We were quite perplexed 12 hours into the contest by how other teams were able to consistently yield cracks for bcrypt while it appeared we were not really traveling anywhere. Once the plains were analyzed we identified some movie lines. We initially started off with Star Wars movie lines, then progressed to Star Trek movie lines, these were the following critical patterns identified.

Lines containing 2,3,4 words were extracted based on the word boundaries from movie scripts/subtitle files, then divided into various lengths for attacks

  • Len13?s (where s = !@$%)

  • Len12?d?s (where s=!@$%)

  • Len14+ suffix 1

We initially were not totally sure whether all the symbols were used or what the specific attributes were, so there were wasted resources used to check these. However, as the runs progressed, we were able to reduce the keyspace by improving our parameters, such as using only certain suffix patterns with certain lengths.

Once we knew the correlation between the hash sets, it was merely a game of attacking the fastest algorithm MD5, then filtering the attacks through all the algorithms to maximize points. Since we used a much larger repertoire of movie lines and corpuses on the faster hash types, we used the obligatory ChatGPT to identify the origin of the phrases. This involved converting our cracks to base words and asking where the phrases came from. Once all the sources were identified, we manually gathered all the movie lines/srt files and processed them as described above. Very large tasks were spawned to cover exactly those patterns, which gave us consistent cracks throughout the contest. The list below were the films we identified.

  • This was the movie list we used

  • 2001 A Space Odyssey

  • Alien series

  • Army of Darkness

  • Battlestar Galactica

  • Blade Runner

  • Close Encounters of the Third Kind

  • Contact

  • Dune

  • Event Horizon

  • Ex Machina

  • Firefly

  • Ground control

  • Guardians of the Galaxy

  • I, Robot

  • Inception

  • Interview with the Vampire

  • Mad Max

  • Minority Report

  • RoboCop

  • Star Trek

  • Star Wars

  • The Day the Earth Stood Still

  • The Expanse

  • The Fifth Element

  • The Galaxy Quest

  • The Hitchhiker's Guide to the Galaxy

  • The Matrix Trilogy

  • The Terminator

  • The Thing

  • The War of the Worlds

  • Tron

Since many plain texts were recovered using this method, we will add some additional information. We suspected there was a parsing defect (it was disclosed in the discussion afterward that delimiting on punctuation characters was used, which is why we noticed this obscure behavior).

When parsing our datasets, we were delimiting only on space as opposed to a sliding window method to ensure full word boundaries sentences were created, e.g. “something was here” instead of “omething was here”. However, we noticed in our cracks that we had phrases that were not “word bounded”, they appeared like this

“t go there today”

“ve got to be here”

For some reason, it did not occur to us that the phrases could be delimited on punctuation. Instead, we took a different approach and emulated this behavior. We simply took our existing phrase lists and prefixed d/t/s/m or ve/re while maintaining length constraints, this also gave us very good results. It was important to note that since we were dealing with long lengths, it was important to turn off “-O” when used with SHA512crypt sha256crypt due to the length 15 limitation.

We crafted some additional tooling during the contest to visualize and query the datasets with standard SQL like queries (yay to SQLite). All our new and existing plaintext cracks along with all the metadata were associated and synced constantly. This tooling played a pivotal role not only in creating -a 9 association attacks, but also in helping us identify patterns and create hash subsets. Some of these patterns included.

  • #3&4%# only applied to Telecom users

  • Russian users with russian words with suffix ‘1’

  • Icelandic phrases for users with Icelandic names

  • Ghosting users with the hinted passwords

  • Company names with word suffix

  • Saleswords + prefix/suffix rules for sales team

Being able to cut down the salt list for the slow algorithms meant we did not actually need lots of computing resources, we would strategically target subsets of hashes when we suspected a pattern. It was noticed that lots of effort was put in by Korelogic in generating the dataset. We noticed that Japanese users had Japanese passwords, usually users with UTF-8 encoded names used UTF-8 encoded passwords, even the area the user was from determined the password such as Indian users having Hindi passwords.

We were able to decode all the hints. However, at times I think we read too deep into these, and they almost sent us down a rabbit hole, throwing us off.

The workflow below roughly describes the process used. When large attacks were crafted and dispatched, users who felt like joining would participate, while others continued to run other attacks and discover new patterns for analysis. We also ran translations of the base words through various languages and tested numerous times to see if we could spot new languages.



We did not have any designated role, the team just played off on everyone's strengths. An example is that we initially found many of the word phrases via a large generic reddit comments wordlist which was quickly distributed among us, this then progressed to movies, from there on we identified the subset of movies. Due to the short phrases, it was not easy to determine the origin early on as they could have come from anywhere, though we were eventually able to piece things together. We worked collectively in gathering the various resources and once ready these resources and tasks were distributed via our job management system and those who opted to would join in. During this entire time various scripts/tools/ideas/automation/platform updates were made on-the-fly and contributed to ensure we were working with optimal resources and efficiently.

Things we missed

We were able to partially solve the prod/dev [separator] hint. We found the following separators (-_| %20). Sadly, we tested the other URL encodes in upper hex format %2F instead of %2f, so we missed numerous cracks here.

We did spend some time on the CRC hint and parsed the CRC book, scoured the web for chemical compounds. Wrote scripts to generate carbon chains. While we did get a few cracks, it did not appear that significant, or maybe we were not able to find them.

We did not use a large repository of books, it appears that our movie list gave us enough work to compute through and most likely had overlap with books.

We did see some streets/roads early on but forgot to pursue this further.

We noticed some mathematical formulas as passwords, though did not look took deep into this

We had a member suggest the use of a dictionary labeled as ‘polkski_dict’ (which appeared to contain a random assortment of things) very early on. This was far too big to test across the tougher algorithms, we put it aside and forgot about it until near the end where we were able to cut down the contents and found it to be decent in producing founds, we were not able to fully exhaust this dictionary due to time limits.

Take-away

While a large number of compute resources certainly makes a difference, if you are simply throwing hashes and random lists/attacks at the situation and hoping for the best, more likely than not the outcome won't be desirable. Identifying patterns and sources then optimizing the attack parameters such as cutting down salts via attacking a subset of hashes or using a specific list set/rule helps dramatically, also ensuring the workload is able to saturate the compute cores is also critical.

All in all, our team as usual had very little sleep and a wonderful time solving the challenges and competing against the best teams. It was great to be able to use correlated data in hash cracking. We can only imagine the thought and effort involved in creating the challenges along with hints and finally wrapping it all up in a nice, well-run contest. Kudos to Korelogic.


Saturday, August 20, 2022

Korelogic's CMIYC 2022 @ DEF CON 30 Write-up

What a breath of fresh air to have DEF CON 30 not be canceled this year. We are thankful that Korelogic’s CMIYC is running strong 13 years in. Going into this contest we assumed the competition would be quite fierce, however we are always up for a fair challenge. As most members on our team are hobbyist password crackers not working in the cybersec industry, this gave us a wonderful opportunity to dust down and re-paste our GPUs.

Our roster this year comprised of 9 active crackers, 1 part-time cracker and 2 others providing ancillary support. The members included were AMD, blazer, cvsi, gearjunkie, golem445, hops, MXall0c, Waffle, s3in!c, pdo with winxp5421 and usasoft playing support roles in our comms and hash management system.

Our hardware list consisted of

 Confirmed

      1x 3080

      3x 2080 TI

      2x 2080

      4x 1080

      7x 1080 TI

      2x 1070

Unconfirmed

      We are missing the GPU count from two of our crackers. You know who you are, if you read this please report back.

Brief challenge overview

The 7zip and half-md5 archives were cracked easily using john the ripper (JTR). JTR was subsequently used with the correct libxcrypt library to crack the yescrypt hashes as that was the only cracker supporting yescrypt.

web.conf challenge involved setting up Jasypt (the Java equivalent to Python’s hashlib). The runme.sh indicated that each line in the output.txt was encrypted with PBEWITHMD5ANDDES, with both the input and password being the same. Therefore as a verification, if the ciphertext was correctly decrypted, the resulting output would be identical to the password.Initially, the decrypt.sh provided by Jasypt was used to manually feed in the candidates, a python script was later written to automate the process. This wasn’t relied on as it was easy enough to guess by hand once the pattern was observed. We were able to quickly form the url for this challenge and decipher the url to reveal the ‘tennis shoes’ hash list as a team.

LoopAES challenge; while this appeared to be a mountable filesystem, the length did not coincide with this, therefore it was presumed to be aespipe instead. However, since there is no ability to check whether the file is decrypted correctly a quick perl script was produced to run sanity checks on the output to determine a successful decrypt.

list23-Authoritiesappeart…gpg was cracked using JTR with a small dictionary containing the potential candidates while the heavy lifting was carried out by JTRs rule engine.

DEFCON-with-key.kdbx; initially attempted with JTR, once the keyfile was released during the contest we had issues obtaining a valid hash from keepass2john. A closer inspection showed that Korelegic messed with the keyfile and instead of using base64 encoded data, a hex string was present. This involved fixing the keyfile by decoding the hex and encoding the resulting data with base64 to extract a valid hash. Admittedly more time was spent than we would like to admit until someone tried opening the KeePass database only using the keyfile and no password.

DEFCON.kbdx (released later in the contest), once the file was released it was converted to a hash and hashcat was used to crack the hash.

Gocryptfs, a VM was spun up for this challenge and the hash quickly cracked manually on the 4th try, no cracking software was used.

Salt_and_pepper took us slightly longer than expected to identify the salt and peppers for. We did test other hash types as a precaution.

The odt file was cracked using Passware’s Passcovery, though JTR should have been able to handle this as well. We also found out that ms-word does not open password protected open office documents.

Riddled_wrapped_in_an.zip was cracked with both JTR and Passcovery independently.

Problems

We did have a slight issue parsing the nsldaps SSHA-1 hashes, this was quickly corrected, though it did accidentally trigger a double upload to Korelogic, other than this incident we did not run into any submission difficulties.

We spent a considerable amount of time wasted on cracking the initial uncrackable KeePass file. Although the name did suggest a key was needed, it wasn’t evident whether this was a hint for hashes inside, so it was better to at least try. GPUs were put on the extracted hash for this file but obviously was not able to yield any results.

We initially used hashcat mode 25400 to attack the converted pdf hash. This resulted in many false positives. These false positives, many which contained untypable characters, were tested and unsuccessful in opening up the file with non-typable characters. Mode 10500 was then used to obtain the correct password. The pdf was then required to have the security removed as there were restrictions preventing the hashes from being copied out. Furthermore, once the hashes were copied out, slight reformatting was necessary to get them usable.

Fooo file from the enigma zip; Resources were used to analyse and tear apart the bizarre fooo file. Analysis of the file indicated it was a repeating block of 3.8M of data 27 times, which suggested the data was not encrypted due to the low entropy of output produced.  Tools used included binwalk, veracrypt, random and suspicious enigma file encryption/decryption application from softpedia and sourceforge and finally the FreeBSD enigma crypt utility.  Like the LoopAES decrypts, an automated decrypt and validate script was used to scan the decoded output for meaningful data, our attempts to decode the file were futile. Another suggestion was that since the data was nicely divisible by 16/20/32 bytes it could suggest that they were MD5/SHA1 or SHA256 hashes dumped straight out of memory. The bytes were reassembled in both endian forms, then run through crackers to try “de-hash*” them.

Analysis

The actual “de-hash*” procedure was relatively straightforward, once a challenge was solved the hashes were parsed and uploaded to our hash management system. From here, members worked collaboratively but autonomously. Once a pattern was spotted, it was reported and members would share this information, users parsed their own lists which ensured a variation of coverage for that wordlist set. This enabled us to cover each other’s parsing errors, tooling issues and hopefully differences used in the plaintext generation by Korelogic.

After a hashset was cracked close to completion, the difficulty would increase drastically. This could have been caused by missing the baseword/phrases or missing a ruleset or pattern to match the plaintext. To push that last few percent, members would shift their attack strategy or redeploy to a different algorithm as having a fresh set of eyes/wordlist does wonders.

Although we had a distributed hashtopolis instance setup, we did not have to tap in it. We did not require a large amount of compute resources for this contest and even if it was available, the challenges would have been a bottlenecking factor rather than being compute bound. Some of the challenges were even solved on a laptop or legacy hardware. Where possible, we would try to reduce the keyspace tested to a minimal number of candidates. It was also beneficial to communicate to others what was happening to prevent work being overlapped. The most important part was to recognize the patterns and adapt quickly or leverage the correct toolset.

The SHA224 hashes proved to be quite intriguing. We identified two basewords ‘hacker’ and ‘homecoming’ and quickly noticed the plaintexts were based on a heavily mangled version of these words. This included case toggles throughout the whole word, coupled with insertions/deletions, overstrikes and character swaps. It was evident that although being 2 simple basewords, this would result in an insanely large keyspace as the plaintext length increased. Once an adequate amount of plains were cracked, some users switched over to markov models, including the use of OMEN, PCFG_Cracker and PrinceProcessor and these tools gave us a decent amount of hits. We would also cycle the extracted rules from the plaintext through other basewords, as this would allow us to identify potential basewords, we thought we found another ‘ihatecooking/ihatecoding’, though this was possibly an artifact of the mutations on the original two basewords.

Thoughts

It would appear we were the first team to submit cracks for every algorithm, though we are unsure whether we were the first to actually solve all the challenges as other teams may have been working on the hashes or busy parsing them or had not actually submitted cracks yet.

If we didn’t push the envelope of password cracking, we sure pushed the envelope of interpreting a chunk of repeating bytes and demonstrating our creativity in trying to make something useful out of it. More than enough resources were spent on this task, if you are reading this Korelogic, please check your inbox for a laugh.

This year’s contest was structured quite differently to last year. Teams who were unable to solve the challenges would not have been able to unlock the hashes, in a way it pushed teams to explore. However, this would have punished teams less experienced with CTF style challenges causing a huge idle of processing power. We tried our best to keep Team Hashcat on their toes and traded places with them 5 times throughout the contest. Team Hashcat demonstrated their expertise and skills by cracking almost the most hashes across algorithms, while Hashmob was also able to keep up by solving the hashes quite rapidly once a challenge was unlocked. Towards the end of the contest when all the teams had solved all the challenges it came down to being able to constantly supply new founds as this would allow us to at least maintain our position.

As a group we had excellent team synergy, as a contest other than the red herrings it was well thought out and planned, as a competition we thoroughly enjoyed facing our competitors Team Hashcat, Hashmob, john-users, achondritic and trontastic.

Footnotes:

We are well aware that you cannot “de-hash” as hashes are strictly one way digests. It has not stopped others using it and has not stopped us from having a friendly poke at it. See here for more info.