View Source, Steal Identity:
HTML Security

A quick lesson in hacking and defending

HTML code icon - free

Imagine this.
A local college hired you to develop their faculty directory on their school web site.
Days later, all the faculty accounts were compromised.
What happened?

There's a useful thing that hackers love to skim through called the page source. Sometimes, with one right click of the mouse, we can quickly figure out the syntax of the username and password queries on server side. We can find the trail of hidden documents inside the HTML, file paths that could help us find the next piece of the puzzle to unlocking the server hosting the website or unsecure file paths that lead to critically sensitive information.

Hackers begin at the reconnaissance stage of the Cyber Kill Chain. This is simply the stage when we grab all the information we can to tailor our social engineering or create a giant master list of contact information to use to hack faculty accounts. Cyber defenders don't want to make that easy though. They shouldn't. But it's not up to them, not at first. It's up to whoever is developing the page.

We're going to look at a random school page whose name I won't disclose. All personal contact information scraped from this website will be hidden. Note that what I did was an authorized pen testing activity by the school.

In the photo below, I marked out all the personal information. I didn't even need internal access. I'm just a random online visitor and already saw the faculty's full names, office locations, specific departments, email address, and even a phone number to reach them. Now I just want to bring to your attention: the index.html file for the faculty directory.

screenshot of directory

Say my goal for the reconnaissance stage is to rip all the email addresses off the public-facing faculty directory page. I could easily go into the page source, right? Maybe a quick copy and paste? But there might be over a thousand emails to work with. Now what?

screenshot of page source

Well, when I looked through the page source with CTRL + F to find mentions of "email," I found 1,352 occurrences. But a quick bit of navigation showed me that I really have 1,350 faculty email addresses sitting there for me to collect. I have my Kali Linux virtual machine that I've named noobhax with a personal account called noob. And I just have to open up a terminal window and give some really simple commands. It can be curl, I suggest you look into it. But I'm gonna do something just as easy. I'm going to use wget.

screenshot of wget command

When I use wget, I'm asking my machine to pull that page source out of the target website and download it straight to me. Here you'll see that I blocked off the real domain name and IP addresses. You'll also see a "index.html" at the bottom. The page source was downloaded automatically to my machine as "index.html."
What's insane is that I can also auto-rip all the email addresses straight out of this huge html file and have them automatically sorted the way I like it with the output saving into a file I'll call "emails.txt."

screenshot of grep command

Now the email addresses are in alphabetical order. I typed tail emails.txt to look at the last ten email addresses. They all began with Z. I then typed head emails.txt to confirm the first ten email addresses I've scraped from the web page's html file. It was all As. You'll see the command wc -l emails.txt, I did that to confirm how many lines of email addreses fit my extraction rule for email addresses. Confirmed it was 1,350 emails. Now I have a list of email addresses that I can feed into a password cracking tool. It can be that easy. (I already proved that I could do it to the school, so don't worry. I'm not gonna do it again.)

You might be surprised to find how easy it is to do this. It took me a couple minutes to rip, sort, and create a ready-made wordlist to feed anywhere. It's even easier to secure these emails using both HTML and JavaScript to obfuscate this contact information to prevent hackers from scraping the information too easily and getting straight to building the next phases of their more personalized attacks. But I'll have to teach you that one next one.