Tuesday, May 21, 2019

CracktheCon Year 1

Over the years the members of the group have participated in quite a few password contests. Positive hack days’s Hashrunner, Defcon | Derbycon’s Crack Me If You Can, and SaintCon’s Pcrack. In fact, Hashrunner was the first contest Cynosure Prime competed in as a team. We love password cracking, especially come contest time. There is something special about the team atmosphere during contests, the problem solving, the ideas that come from those problems, the group chat, and light-hearted jokes and conversation between discoveries. It’s fun enough said.

The group decided it would be pretty awesome to design a password cracking competition of our own. Our member winxp5421 has been attending and speaking at Cyphercon for the last couple of years and seemed like a great fit for what we wanted to do. So discussions started with the con owner Mike Goetzman about putting together a password cracking village. Mike is an enabler of the best kind. You tell him of something you would like to do and the man will push you and help any way he can to make that a reality. We can't thank him enough for his support! Thank you, Mike!

A lot of thought was put into designing the challenges so that the street teams could participate with little experience and or little hardware, while still keeping things hard enough so that more experienced teams wouldn’t just blitz through the challenges. There is a difficult balance to achieve with this as challenges cannot be so difficult that no one solves them. The pro team challenges contained significantly tougher hashes and for certain challenges required teams to think outside the box such as using techniques to gather hashes or cracking hashes with no mainstream cracker support.


We used Google Cloud Platform for our server hosting. This decision was mainly because of the free $300 Google Cloud Credit you get as a trial. It took us just under a year to design, develop, test, and run the contest (In our free time of course). That $300 credit was just enough for us to complete the challenge without paying for anything out of our pockets for computing resources ( Thanks Google!). We don't have sponsors of any kind so everything had to be done on the cheap. This was a great way to help kick off the contest with no upfront cost. Come contest time we had 2 servers one for the DNS challenge (more on that later) and the other was running the main contest site. The main contest site had 8 CPU cores, 16GB of ram, a 128GB SSD, and a 500GB HDD. This proved to be WAY overpowered during the contest even with team submissions containing lots of duplicate “founds”. This was mainly due to the “pre-filter” designed by hops to catch incorrect and duplicate submissions before team submissions hit the Database. This pre-filter saved us on performance issues big time.

We found out the theme of Cyphercon this year was “Hidden in plain sight”. Hmm, hiding hashes, founds, etc. in plain sight… not the easiest but, I think we did well-designing challenges to conform to the theme as much as we could.

The street challenges ranged from easy to rather difficult. Catchall, oui:2C3033 and Vision being the easier lists.
YourNameRocks, YouLikeToSing? Being the intermediate difficulty lists
Brainkiller and phytology being the most difficult lists given to the street teams.

Vision was one of the easiest to pick up on and crack. This list was md5 and plains consisted of TV show names. Pulled from the tvdb list.

Catchall consisted of sha1, md5 as the hash algorithms and used names of Pokemon in two languages as the base list and used some basic rules to modify the plains. The two algorithms with different hash lengths were used as a subliminal hint for the OUI:2C3033 hash list.

The Oui:2C3033 file name was, of course, the biggest hint of all for the OUI hashes. A quick google search would reveal that this list had something to do with Netgear routers. An even further google search would reveal that by default Netgear routers have a well-known wifi password scheme of “adjective noun 1-3 digits”. However, we can't just make things that easy now, could we? CsP specializes in the strange and obscure when it comes to database leaks. One of the things you quickly discover when dealing hashes in the wild is just because a hash looks like one algorithm it may not be or even worse it might be two different algorithms that a look exactly the same. The OUI list was actually two hash types that looked exactly the same sha1 which most teams picked up on very quickly but, the less discovered algo was mysql5. This challenge was about teaching our street teams that just because a hash looks like one thing does not mean it is. I think that lesson is something all of our hash cracking community should operate under.

YourNameRocks consisted of md5apr hashes. The plains were pulled from the very well known RockYou list as well as a facebook names list. The baselists were also modified with some basic rules

YouLikeToSing? As the name suggests contains lyrics but, with some text scrambling and tougher rule manipulation.

Brainkiller took the same concepts as the pro team Brainkiller hashes with just easier cost factors and easier plains.

Phytology. Oh, how the Phytology list gave our street teams headaches. This list clearly gave everyone trouble. It's the only list with zero that is right zero teams finding even one single hash. That is more our fault than the players. It was just too tough for teams. This list was designed to be the “great equalizer”. Just in case we had a team that was not playing in the correct weight class and it was too much, a little over the top we could say. Like the name suggests this list is based on plant names but not layman plant names, no the scientific names for plants with ?d?d?d and ?d?d?s appended to them. It was just too much and we are sorry. Chalk it up as a design/ creation learning experience.

InTheZone:

 You know what would be a great algo to hide in plain sight? NSEC3. It's always in plain sight. The hint or start of this challenge was simply “dig +dnssec txt starthere.crackthecon.0x23.pw”. This challenge was all about walking a DNS zone and picking up NSEC hashes along the way. Cracking those NSEC hashes and submitting solutions.

Plzadd2hc:

This list was obtained by hitting the TXT records from InTheZone. Once you found a hash by querying the DNS server. The server would return unfortunate news, an Argon2 hash. Those that found them were unable to use hashcat to crack these. Seen as the nature of these hashes are rough going the list used the same plaintexts as the NSEC3 hashes with some case twiddles.

Brainkiller:

consisting of 4 levels:
- L1: easy plains which are crackable without much previous information. This list also had salts that where base64 encoded ASCII art as hints.
- L2: one static salt for all hashes, more difficult passwords used
- L3: all cost factors set to 19, but the hashes are generated with a cost in a lower range. Plains should be medium difficulty.
- L4: salts contain information which founds from L1-L3 are taken as basewords and concatenated with slight modifications.

BreakingBad:

 These hashes where Bcrypt cost 10. This list has a distinct science theme consisting of elements and words sampled from the alchemy game.

DESleppard:

 Lyrics from various artists including Justin Bieber with whacky mutations and reshuffles.

AprilFools:

 Words from various sources with rules that hashcat cannot handle mixed in.


Hiddeninplainsight:

 This challenge was really cool. We generated 7z archives where the passwords to the archive was hidden in the CRC values of the files inside of the archive itself. A literal Hidden in plain sight challenge. The first archive’s password was simply the crc32 of the file inside. While the other two were a little more devilish but, once you hit the final archive you would be given the FantasticFour hashes.

FantasticFour:


Generally, one easy hash is included to help crackers reduce the wordlist
- 5x Bcrypt hashes (based on cyphercon/Wisconsin
- 5x MD5 hashes based on dog breeds with varying round (max 1337) hint is provided in one of the hashes which are suffixed with 1337
- 5x GRUB2 PBKDF hashes based on colors
- 5x DCC2 hashes based on stars  (the hint is provided if you apply rot13 to the usernames, of the hashes)

100% salt-free 110% hassle:

Sampled from isp, soundhound, hashkiller & domains list
Various names from names + wordlist combos use rules hashcat does not support
Sampling from sound_cloud, names, isp, domains txt
Apply ultramap, 1 char to many char remapping eg w -> \/\/ and vv, H => |-|
Apply bizzare half reverse
Apply Suffix first char
Apply Prefix last char
Apply rules to memory + non memory
Use mixed char swaps
Use blockinsert rule


SuperList:

nouns(len3-5)+? combo with superheroes (mutated),  superheroes+superheros(mutated) + rules


A lot of the passwords were basewords sampled from rockyou, fbnames, tmto while some were collected such as the superlist which was taken from the superhero directory or breakingbad which contained elements but also had words sampled from the alchemy game mixed in.  The most interesting part was the rules that were applied to the words. Many of the rules either did not exist or were not conventional rules including single char replacements as opposed to the general swap all, one char to many char mappings rather than one to one. Some trickier ones included forward blockclone but from any position, as opposed to the standard start and end blockclone, semi reversing words. Applying rules to memorized word/portion and also applying rules to the non-memorized portion. The idea was for teams to use conventional rules to discover the baselist or perhaps spot some of the non-conventional rules then apply these to the baselists.

We were surprised that while teams were able to unlock the fantastic four hashes, they either were not able to find the pattern or notice the hints. Each set of hashes from the fantastic four challenge contained at least one easy hash which was supposed to give the crackers some idea on where to look. There was a hint for the DCC2 hashes where the usernames suggested something was going on. The 13 was present to suggest ROT13 was used, if this was applied to the letters of the usernames, it would have spelled ‘star’ which is what that list was based on. Each set of hashes was based on a theme and the MD5 set was made extra hard by using varying rounds of MD5. After cracking the ‘easy’ MD5 that should have prompted users which theme to focus on and the ‘1337’ was supposed to indicate the upper bound of the rounds. It is possible we didn’t allocate enough points to entice users to crack these or perhaps made them too difficult as we wanted to justify the number of points allocated to these.

With the first CracktheCon Contest behind us, I have to say things went really well.The contest site itself was rock solid however, the contest was not without its problems both player facing and behind the scenes.

The first day of Cyphercon beer was spilled on hop’s laptop. So hops spent most of the first day disassembling his laptop a few times trying to get his keyboard functional again. It was quite the sight watching him rinse his keyboard off in the bathroom sink. Luckily the expensive parts of the laptop were not damaged and a replacement keyboard arrived a couple of days later. He’s now getting used to the US lazout, darn…, layout.

Our member winxp5421 has quite the server infrastructure in his basement. A lot of CsP members use these machines for testing, development, etc. These machines where not being used for the actual contest site but they were being used as logging and visibility of what was happening on the contest servers. The last few months leading up to the contest Winxp has been working on plans to finish his basement it just so happened that the only time the framers could come put the basement walls up was day one of the convention. That proved to be unfortunate as the framers destroyed the coaxial feed line to his cable modem leaving us in the dark with no immediate remedy. Seen as Winxp was at the con a couple of hours drive away from home this proved to be a difficult fix but, a few phone calls, a buddy-pal-guy conversation later and a good friend came to fix the problem. (Thanks, Nate!)  We lost a couple of hours of logging data but, the contest remained unaffected and business as usual.

We sent out a reminder email the day before the contest. Each team was supposed to get a personalized email using the team name they signed up with. Unfortunately, everyone got an email addressing the name “false”... whoops, a variable screw up. On top of that Winxp was incapable of sending a single email that was typo-free though the entire contest. We were starting to think the man had a stroke.

It’s always DNS:
A couple of hours before contest’s start we noticed that the generated plaintext to the NSEC3 hashes was not unique. Instead of 33100 hashes, we ended up with only 32492. As everything was already set up and well tested we decided to not fix that issue. It wasn’t really a problem because the worst case was to have multiple TXT records (Argon2 hashes) for one domain.

“Problems” with DEScrypt:

On the first day, we received complains that some DEScrypt hashes couldn’t be submitted. All the “faulty” submissions had at least one high ASCII character in it and we used only characters in the printable ASCII range. There's a fun thing in DEScrypt because of the DES parity bit. If you add 0x80 to any character it will generate the same hash. For example mDPKlCDttlrdw:CynoSure and mDPKlCDttlrdw:$HEX[4379ee6f53757265] Teams were submitting “solutions” that did not match our solution.


Brainkiller problems:

#1: When inserting the challenges into the DB we used batch insert (50 at once). There were a few plains which seemed to have an invalid encoding (not valid UTF-8). The inserts that had these invalid encodings failed so 50 hashes were not inserted because of one bad one. This resulted in some ~2k hashes which were in the hashlist, but not in the db. The missing hashes were inserted into the DB. The 2x multiplier was disabled for the missing hashes and the invalid UTF-8 was posted as do not crack.

#2: For level 1 hashes, the plains for the hashes were giving ASCII art of a song when being in order. To give a hint, we also wanted to have a song text ASCII art encoded in the salts of the hashes. The issue with this was the base64 variant used was not the correct one, therefore using the normal bcrypt base64 decode did not reveal this hint.

During a previous CMIYC contest. Korlogic had a live points graph. A lot of people like to see live graphs during the contests so we figured it would be a good idea to do the same for ours. This one year in particular hashcat was randomly assigned a color for the graph. This color happened to be pink. Soon after the contest start team, hashcat’s color changed to something with a darker color pallet. The RNG gods blessed hashcat with great color but, that color was changed we thought that was a shame and heckled team hashcat about it. It was rather strange that team hashcat’s color was also pink for our contest. Was it the RNG gods? Or an “if hashcat: then pink”? I'll leave that for you to decide.

HackTheNation was killing it in the upload game. Pushing more lines to the API than any other team. In total, the street teams had 654k hashes to crack. HackTheNation pushed 73.4 Million lines over the 48 hour contest period. Submitting hashes more than once did not gain teams any extra points but damn if that was going to stop HackTheNation from trying. We did not supply teams with a way to chunk their uploads to conform to the 50,000 line limit set by our API. Our guess as to why they pushed so many lines would be their submission script was simply using “tail -n 50000” to keep within our API call limits. We would love to hear why this team supplied is with so many duplicate hashes.
These frequent duplicate uploads made us really glad we used a pre-filter before hitting the DB with a query on each upload or we would have been in serious performance trouble. Even with all of the duplicate entries, our contest server was WAY overpowered. Our Load Averages stayed below 1.0 for the duration of the contest on an 8 core machine.

We want to thank Mike Goetzmen for all the hard work and dedication he puts into the con each and every year as well as the rest of the Cyphercon staff. We would also like to thank the participating teams. Turn out for the first password cracking contest we put on was better than we had hoped for. We hope you guys had fun with it and will join us next year. We have some ideas for further team torture.

Contest Solutions are available here

In veneration,
CsP

Tuesday, August 21, 2018

Crack me if you can 2018 write-up

Crack me if you can write-up 2018



Active participating members
15
GPUs equivalent to GTX1080 peak
60
GPUs equivalent to GTX1080 constant
40
CPU threads peak
1300
CPU threads constant
600
Contest related Instant Messages sent
~7000
Hash:plain submissions to internal platform
>5300
Hash:plain submissions to Korelogic
2293



Members

blazer cvsi espira gearjunkie hops m33x mastercracker milzo jimbas mexx666666 s3in!c usasoft user vetronexe winxp5421





Prep

After hearing news that Korelogic would be awarding bonus points for first unique founds, we took precautions to tune our submission process to ensure we could capitalise on this bonus. To avoid false spam triggers, an alternate email provider that supported bulk inbound/outbound requests was used. In addition, various functions on our hash management platform were disabled and tweaked such that the hash:plain pairs could be processed and uploaded quickly at a constant but not too aggressive rate.  We only had a handful of submission troubles which were rectified quickly on our end.





Patterns

It was quite cheeky for Korelogic to use usernames from the competing teams as plaintexts and this was spotted quite early on in our MD5 list. Similarly, they were seen in the SSHA, MD5(unix) lists, we also noticed that each algorithm was assigned a specific range of starting characters. Seeing as that the other teams were getting bcrypts it appeared that these were possible, and this was where all the points were at.  While some of our members continued to collect points by exploiting the 4x first unique found bonus for the lower scoring hashes, others worked on trying to get a break on bcrypt hashes using the patterns we spotted. It was not long before we found the starting characters for the bcrypt hashes using the usernames in double combo mode.


Strategy
Once we had the first bcrypt hit, we tried to uncover the complete list of usernames from the plains found in the faster algorithms. After we were confident we had a solid pattern, we brought up many CPU crackers running MDXfind to work solely on bcrypt hashes. It was a little chaotic initially as we tried to figure out the best way to distribute the workload for bcrypt hashes. One of our members then stepped up and became the central point for distributing the tasks but the task distribution and request was still done manually. Soon another member whipped up a semi-automated procedure where each member could request custom tasks from a central distribution list. During our peak we utilised roughly 1300 CPU threads but we had around 600 sustained threads throughout the contest. A small cluster of 16 odroids (XU4) running MDXfind-ARM were also used to attack the bcrypt hashes. Sidenote, it was relatively cheap and efficient to attack bcrypts using ARM cores. Each odroid gave us roughly 50H/s (800H/s in total) for the contest’s bcrypt hashes (cost factor 10) and the cluster in total uses approximately 200W. This results in a efficiency of 4H/s/W.

Due to the unfriendly nature of bcrypt on GPU, all GPU resources were reserved for the other 3 algorithms which worked much more efficiently with hashcat GPU. Members were free to decide whether they wanted work on patterns alone which some opted to and devised their own methods and scripts which they used to attack patterns on the algorithms, while other joined the hashtopolis instance which had around the equivalent of 60 GTX1080s.
We were generally quite close score wise with team hashcat and trailed them for the first 15 hours or so into the contest. When one of our members woke up and submitted over 100 unique bcrypts we leapfrogged over hashcat into first place and took a comfortable commanding lead. This was a great morale boost and more CPU instances were placed onto bcrypt as we realized other teams were using different patterns from us and we had identified a very efficient one which yield many hits for little work. Additional patterns were later identified, such as one where popular suffixes (pass01, pass02 etc) were used across all of the algos); though these did not seem as efficient as the username combos.

Some stats from our hash management platform showing rate of uploads

MD5(Unix)


SSHA
MD5


 Bcrypt

After thoughts
We do regret not switching over to JTR for a nice bcrypt speedup when more candidates than cores are used due to its bitslice interleaved implementation, yielding up to twice the speed over MDXfind. We also failed to spot the full range of starting characters for bcrypt and lost some valuable points there too.

Towards the end we tried to spread the attacks across all the algorithms so we would not only be ranked highest by score but also highest across algorithms. This was quite hard to maintain as it seemed like both team hashcat and john were gaining ground on us. Overall, we were quite impressed with our ability to obtain more unique bcrypt firsts than both john-users and hashcat combined which allowed us to take first place. A massive thanks to Korelogic for hosting the contest once again, we really enjoyed the added twist this year as it gave us all an incentive to constantly submit. A shout out to our competitive rivals, Team Hashcat and john-users for pushing us hard and making us drink that extra cup of coffee to stay up.

Looking ahead
We have enjoyed playing CMIYC over the years. So, when presented with the opportunity to create our own password cracking contest we jumped at the idea. In 2019, we will be hosting our own CMIYC style contest at Cyphercon in Milwaukee, WI. We hope all of you will join us for the first “Crackthecon”. As more information about the contest is finalized we will update the contest site crackthecon.com.



Tuesday, August 29, 2017

320 Million Hashes Exposed


Earlier this month (August 2017) Troy Hunt founder of the website Have I been pwned? [0] released over 319 million plaintext passwords [1] compiled from various non-hashed data breaches, in the form of SHA-1 hashes. Making this data public might allow future passwords to be cross-checked in a secure manner in the hopes of preventing password re-use, especially of those from compromised breaches which were in unhashed plaintext.

Our group (in collaboration with @m33x and @tychotithonus) made an attempt to crack/recover as many of the hashes as possible, both for research purposes and of course to satisfy our curiosity while using this opportunity as a challenge. Although each of the pwned password packs released at the time (3 in total at this writing) were labeled as 40-character ASCII-HEX SHA-1 hashes, we worked under the assumption that “No hash list larger than a few hundred thousand entries, contains only one kind of hash!” - and these lists were no exception.

Nested Hashes
Although the majority of the passwords recovered were plaintext, as expected, we also noticed there were a number of plaintexts themselves being hashes or some form of non-plaintext. This suggested that we were dealing with more than just SHA-1.

Out of the roughly 320 million hashes, we were able to recover all but 116 of the SHA-1 hashes, a roughly 99.9999% success rate. In addition, we attempted to take it a step further and resolve as many “nested” hashes (hashes within hashes) as possible to their ultimate plaintext forms. Through the use of MDXfind [2] we were able to identify over 15 different algorithms in use across the pwned-passwords-1.0.txt and the successive update-1 and update-2 packages following that. We also added support for SHA1SHA512x01 to Hashcat [3].

Taking a deeper dive into the found “plaintexts,” we realized there were hashes-within-hashes, hashes of seemingly garbage data, what appears to be “seeded” hashes, and more. Here is a list of the hash types we found:

There are other hashes we have not completely resolved yet - some of which may be seeded hashes. For example, we see:

sha1(md5(md5($salt).md5($pass)))
sha1(md5($salt).md5($pass)))
sha1(md5(md5($salt1).md5($pass)).$salt2)
sha1(md5($salt1).md5($pass).$salt2)

… and much more.

Personal Identifiable Information
We also saw unusual strings from incorrect import/export that was already present in the original leak. This links the hash to the owner of the password, which was clearly not intended by Troy. We found more than 2.5m email addresses and about 230k email:password combinations.
<firstname.lastname@tld><:.,;| /><password>
<truncated-firstname.lastname@tld><:.,;| /><password>
<@tld><:.,;| /><password>
<username><:.,;| /><password>
<firstname.lastname@tld><:.,;| /><some-hash>

Trash / Other Non-Passwords
Furthermore, there were obviously other strings that were not passwords, but rather fragments of files.  For example:

005a97e5323dac9a43c06bb5fe0a75973ee5e23f:<div><embed src="http://apps.rockyou.com/fxtext.swf?ID=31478642&nopanel=true&stage=true" quality="high" scale="noscale" width="405.37" height="116.475" wmode="transparent" name="rockyou" type="application/x-shockwave-flash" pluginspage="http://www.macrom


006bb7e8893618b02f979dd425e689b4ae64df10:honeyDo you realize who is in this image: http://thecoolpics.com/who.jpg . Just think for a moment and tell me o you realize who is in this image: http://thecoolpics.com/who.jpg . Just think for a moment and tell me soon ;))

Bad Line Parsing
We observed a number of passwords which appeared as they were truncated at length 40 but contained data following the linefeed terminator of the input lists.

n.doe@gmail.com:password:123456jane.doe@

We assumed this was either caused by a parsing error or some anomaly. To recover these strange processed plaintexts, some utilities were coded [4] to emulate the particular behavior of concatenating successive lines while restricting them to 40 characters.

john.doe@gmail.com:password:123456jane.d
ohn.doe@gmail.com:password:123456jane.do
hn.doe@gmail.com:password:123456jane.doe
n.doe@gmail.com:password:123456jane.doe@

Furthermore, to find the correct position where the initial parsing error occurred, we searched our dictionaries from the right to the left (see [4]) concatenating characters like this:

123456jane.doe@ho
o
ho
@ho
e@ho
...
123456jane.doe@ho


 An example of a bad/invalid email imported into the haveibeenpwned.com website

Hashcat’s Hexception
During hash processing, we also caught a glimpse into Troy’s methodology.  We believe that he processed some “cracked” passwords as well, suggested by the presence of $HEX[] plaintexts. This also revealed a bug in Hashcat’s $HEX[] encoding.

For example, consider the following hash:

0b20b6ad0b6c7fd3655e8734cb48c001567983eb:$HEX[244845585b623436653635373737393666373236625d]

Initially, when this was found with Hashcat, it appeared as:

0b20b6ad0b6c7fd3655e8734cb48c001567983eb:$HEX[b46e6577796f726b]

The hash could not be verified as the solution since:

sha1(binary[b46e6577796f726b]):[9def6b97e0095ac93331bc2780cc35a21d9cc752]

We discovered that Hashcat fails to correctly encode a literal string with $HEX[], if the literal string starts with $HEX[.  This means that if you take the output of Hashcat, say from hashcat.pot and try to re-crack it using the passwords in the hashcat.pot file - you will end up with “unsolvable” hashes.  As part of our work involves building dictionaries that we can reuse, we consider this a significant bug.

Some tools [5] were put together to properly re-encode the output from Hashcat, into the proper string:

$HEX[244845585b623436653635373737393666373236625d]

This then works properly as a reusable password with Hashcat and MDXfind, as it decodes into the literal string:

$HEX[b46e6577796f726b]

This issue has been resolved in a beta version of Hashcat [6].

We also uncovered a second bug in Hashcat, which was later corrected in a beta version. When using certain rules, we found that the solutions that Hashcat was offering also did not hash back to the correct value.  We ended up with hundreds of  “solutions” that really were not solutions at all. This is one of the reasons that we always try to double-check our work, to ensure that we have accurate hashes and plaintexts.

As a final check, we took just the SHA1x01 passwords we found and re-ran them through both Hashcat (Beta v3.6.0-351-gec874c1) and MDXfind. The results were quite illuminating. The test system used was a 4 core Intel Core i7-6700K system, with 4x GTX1080 cards and 64GB of memory. Using Hashcat, we found that loading more than about 250,000,000 hashes at a time was not possible [7] and as a result, the list was broken up into chunks of 225m hashes.


Program
Time to Complete
Hashes Found
Hashcat
55 minutes
318,932,512
MDXfind (all hashes)
9 minutes
318,933,582
MDXfind (225m chunks)
9 minutes
318,933,582

From our usage patterns, it is evident that both applications have their strengths and caveats. MDXfind shows its strength when the hashlist is too large to fit into GPU memory, when many algorithms need to be checked in parallel and when very long password strings need to be tested. Hashcat, on the other hand, shines when parallel compute is needed; such as running large rule sets and large keyspaces. Using the tools in tandem gives us the best of both worlds since we can feed the left list of each successive attack into either program to achieve optimal efficiency and coverage.

To further illustrate the problem with password reuse (and the importance of validation), the hashes were re-run using just the found password of Hashcat (Beta v3.6.0-351-gec874c1).  This resulted in 86,954 hashes not being recovered. These are primarily due to the $HEX encoding error that Hashcat makes.

Distributed Tasks
Once the hashlist was small enough where the size of the hashlist had negligible effects on search speed, distributed brute-force and mask attacks were conducted via Hashtopussy [8] a Hashcat wrapper.  Combining our hardware, we were able to achieve peak speeds of over 180GH/s on SHA-1, to put things into perspective that's roughly the speed of 25x GTX1080s. We were able to cover ?a length 1-8, ?l?d length 9-10 and ?b length 1-6 effortlessly.

Statistical Properties
In order to speed up the analysis of such a large volume of plaintexts, a custom tool was coded “Panal” (will be released at a later time) to quickly and accurately analyse our large dataset of over 320 million passwords. The longest password we found was 400 characters, while the shortest was only 3 characters long. About 0.06% of passwords were 50 characters or longer with 96.67% of passwords being 16 characters or less.  Roughly 87.3% of passwords fall into the character set of LowerNum 47.5%, LowerCase 24.75%, Num 8.15%, and MixedNum 6.89% respectively. In addition we saw UTF-8 encoded passwords along with passes containing control characters. See [9] for full Panal output.

Length.png 
Charset.png

Summary
Blocking common passwords during account creation has positive effects on the overall password security of a website [10]. While blacklisting 320m leaked passwords might sound like a good idea to further improve password security, it can have unforeseeable consequences on usability (i.e, the level of user frustration). Conventional blacklist approaches typically include the 10k most common passwords to limit online password guessing attack consequences. Until now, there has been no evidence to support which blacklist size provides an optimal balance. 

Post written in collaboration with @m33x and @tychotithonus

Resources
[0] 2017-08-03: Have I been pwned? by Troy Hunt
https://haveibeenpwned.com
[1] 2017-08-03: Introducing 306 Million Freely Downloadable Pwned Passwords 
https://www.troyhunt.com/introducing-306-million-freely-downloadable-pwned-passwords
[2] 2017-08-03: MDXfind v1.93
https://hashes.org/mdxfind.php
[3] 2017-08-28: Hashcat sha1(sha512($pass)) patch
https://gist.github.com/hops/9beda82cf3d21ab99a2971bf8d00dbb4 
[4] 2017-08-27: Some tools we developed to deal with incorrectly parsed strings
https://gist.github.com/m33x/3e0ab19a53384c036db29f996cb60733
[6] 2017-08-20: Hashcat Issue “hexify also all password of format $HEX[]”
https://github.com/hashcat/hashcat/issues/1340
[7] 2017-08-18: Hashcat Issue Potential Silent Cracking Failures at Certain Hash-Count
https://github.com/hashcat/hashcat/issues/1336
[8] 2017-08-03: Hashtopussy by s3inlc
https://github.com/s3inlc/hashtopussy
[9] 2017-8-29: Panal (Password Analysis) 320m HIBP Passwords
https://gist.github.com/m33x/03031e764ae5de179315270973c5871f
[10] 2017-08-03: Password Creation in the Presence of Blacklists
https://www.internetsociety.org/doc/password-creation-presence-blacklists