TLDR; Using Splunk for Offensive security data analysis has advantages over the traditional Grep when trifling through and analysing data.

Why Splunk and not ELK?

ELK is a fantastic open source project, and made even easier thanks to the HELK project by Cyb3rward0g. In fact, I actually tried ELK first before I went over to Splunk. When I used ELK, I realised that I had to create a .config file which specifies the location of my data, before it would be ingested. This made it difficult for me as it meant that for every small data set I would still need to tell it what the headers and data types are. Splunk on the other hand, I managed to upload data by simply clicking on the Web UI and uploading a CSV file.

I know that many Offensive security shops are using ELK for logging and analytics. More on this topic in another post. ELK was just not fit for purpose in a quick PoC environment where I wanted to evaluate what sort of resources were required to spin up an Offensive Analytics System and what benefits could be obtained from doing so. In this post we will focus on using Splunk as a log analysis system to visualise and search data, quickly.

Splunk Installation

Quite simply really, go to Splunk's website and download the MSI for Windows or relevant packages for your OS of choice. I used a machine with 20GB of RAM and 300GB SSD for the installation. Worked fine, you can probably do with less as Splunk does not seem very RAM heavy.

I'd recommend getting a developers license, which gives you 6 months to index any data you want into Splunk. The process of Indexing is simply importing data into the database and optimise it for searching. Just to get the technical search jargon out of the way.

Data Ingestion

Yeah, it's really easy, you can even ingest .json.gz files directly by uploading them. As Project Sonar was too big, I used the command splunk add oneshot sonar.json.gz to get it into Splunk. It took a while to get the data in, but once it's done, searches are blazing fast.
If your data is smaller than 500MB, you can even use the Web UI:
chrome_2018-05-06_23-11-02
chrome_2018-05-06_23-11-34

Project Sonar

To demonstrate how we might use Project Sonar's Forward DNS data, I decided to explore Splunk's capabilities in aggregating and understanding the data.

Content Delivery Networks

Finding domain fronts that work in CloudFront.
Regular grep:

:/$ time zcat fdns_a.json.gz | grep '\.cloudfront\.net' | tee -a cloudfront.net.txt

real    10m25.282s
user    9m16.984s
sys     1m23.031s

Splunk search:

value="*.cloudfront.net"

47.63 seconds

It takes 13.13 times longer to search using grep compared to within Splunk.

Domain Search

Obtain all subdomains for a particular domain:
chrome_2018-05-06_22-59-25
This was almost instant. Which would've also taken 10 minutes with a usual grep. However what's interesting is we can utilise Splunk's analytical capabilities to perform mappings to find out information such as "how many domains share the same host?":
chrome_2018-05-06_23-02-50
to get:
chrome_2018-05-06_23-03-46
Here we can see that the one particular server is pointed to by all of the hostnames on the right.

We can also map the physical locations of the servers, roughly using:
name="*.uber.com" | stats values(name) by value | iplocation value | geostats count by City
to get:
chrome_2018-05-06_23-06-23
Sure, probably not much use, but quickly lets you see where the target's servers are located in the world.

If you are only allowed to attack servers within a particular country for the target organisation, then you can use Splunk to filter a list down too using:
name="*.uber.com" | stats values(name) by value | iplocation value | search Country="United States"
chrome_2018-05-06_23-08-09
You may even choose to cut it down to a particular state if necessary.

Get a list of all subdomains

You can run the following search to get a list of all the subdomains. I didn't bother waiting a long period of time to parse through 1.4 billion results, so I just did it for one domain:

index=main [search name="*.uber.com"] 
| table name
| eval url=$name$ | lookup ut_parse_extended_lookup url

chrome_2018-05-07_15-22-30

or subdomains specifically with:

index=main [search name="*.uber.com"] 
| table name
| eval url=$name$ | lookup ut_parse_extended_lookup url 
| table ut_subdomain 
| dedup ut_subdomain

chrome_2018-05-07_15-23-47
This might be useful if you are trying to resolve the same subdomains against all of their owned domains to try and discover more subdomains.

DomLink domains-> Give me all subdomains

Combining my tool, DomLink released here, we can take the results from a search and ask Splunk to give us a list of all of the subdomains belonging to the target. Quite easily.

Begin by running the tool, and directing the output to a text file using the command line flags:
ConEmu64_2018-05-07_15-30-50
Now that we have a list of domains, in the output file, we can take this and create a new file with a header of name and add *. to the front of the name using a simple regex replace:
sublime_text_2018-05-07_15-32-51
Now the file should look somewhat like this:
sublime_text_2018-05-07_15-33-14
Place this file and name it Book1.csv (well I did) into C:\Program Files\Splunk\etc\system\lookups. Then make a search for:

index=main [inputlookup Book1.csv] | table name

Which will yield the following results:
chrome_2018-05-07_15-41-38
You can then export these results and put it into whatever tool you want to use.

Password Dumps

I took the LeakBase BreachCompilation data and used Outflank's Password Dumps In Elk guide here to get my data into a reasonable format before importing it into Splunk. Running the script does a great job at getting the dump into a space separated file format:
vmware_2018-05-07_19-48-41
I modified it to output to disk instead of push directly into ELK/Splunk.
At this point, we're pretty much good to go. Splunk does not take space separated files so I had to modify the script Outflank provide to make it into a CSV format ready for importing.
Do a splunk add oneshot input.csv -index passwords -sourcetype csv -hostname passwords -auth "admin:changeme" and we're good to go!

Let's see what the most used passwords are!

index=passwords 
|  stats count by password 
|  sort 100 -count

chrome_2018-05-08_17-57-40
chrome_2018-05-08_17-58-29
If you're interested in base words, you can use fuzzy matching (match on password:

index=passwords
| table password 
| fuzzy wordlist="password" type="simple" compare_field="password" output_prefix="fuzz"
| where fuzzmax_match_ratio >= 67
| stats count by password
| sort 100 -count

chrome_2018-05-08_18-11-32
Top passwords used by @facebook.com emails:
chrome_2018-05-08_18-30-19
Map it out as usual:
chrome_2018-05-08_18-32-47

The more interesting piece is the ability to easily cross check passwords with employee OSINT e-mails. Sure you can do this in grep too.

DomLink domains -> Give me all passwords for all domains

After running DomLink (as mentioned above in the post), we can go ahead and use it as an inputlookup. Instead, we will use *@domain.com instead of *.domain.com, and set a header field of email as shown below:
notepad_2018-05-08_19-14-02
Run a query:

index=passwords [inputlookup uber.csv] 
|  table email, password

Then I get a fantastic output in table format, that I can even export as CSV if I wanted:
ApplicationFrameHost_2018-05-08_19-32-13-1
That's pretty cool, at least I think so.

Employee OSINT

Profiling an organisation's employees is important when performing targeted phishing attacks or password spraying across external infrastructure where you need a list of usernames to begin. LinkedInt was developed for this purpose. I wanted a tool that could work reliably and give me a list of employees for a target organisation. This isn't so useful for just one or two particular data sets for several companies. It would probably be more useful if you were to automate and gather employee data for Alexa Top 1 million websites and Fortune 500 companies for example. Sure, this could all be automated.

For the sake of this section, I'm going to demonstrate importing data from one unreleased (will probably release at HITB GSEC) tool to profile employees. This would also work with LinkedInt. Simply import the data as a CSV file as previously shown in the above sections.
chrome_2018-05-07_21-33-40
We can use Splunk to search for employees with a specific surname, name, or location:
chrome_2018-05-07_21-34-48
We can even go ahead and search for job positions, and see how many people are in a certain role at the company:
chrome_2018-05-07_21-48-26
Here we can see that the majority of people are delivery personnel as this company is the equivalent of Deliveroo in China.
Heck, with a bit of magic you can even graph it up to get a view of the distribution of roles:
chrome_2018-05-07_21-53-19
chrome_2018-05-07_21-55-42
Given that I only had data for around 5000 employees, this basically gives a decent idea of the distribution of personnel at the target organisation before you go ahead and perform any social engineering to understand the following types of questions:

  • Should I social engineer as an employee of the organisation?
  • What title should I have?
  • What's my role?
  • Where am I located? (yes, if there's a high distribution of X role at Y location)
  • What percentage of people are in this role?

In some companies, if there's a large percentage (almost 10%) who are delivery men, they probably don't have access or the level of trust from the general population of employees. So making these sort of choices matters.

Last thing, I wonder how accurate my OSINT vs. actual distributions of employees are?
chrome_2018-05-07_22-06-49

Combining Employee OSINT with Password Dumps

It's trivial to combine Employee OSINT'd emails with password dumps to look for passwords:

index=passwords [search index=eleme 
| eval email=$Email$ 
| table email]

chrome_2018-05-09_01-38-37
Pretty nice, at least I think so. Sure, you can do this with grep too, but this is an alternative way to easily import a new CSV file and almost instantly have a matched up password list that you can export as a CSV or table.

Nmap Scans

ConEmu64_2018-05-07_20-14-12
Upload the .gnmap file to Splunk, set timestamps to none.
You can then run the following example query to begin formatting the data and searching it in a nice manner within Splunk:
1

source="uber.gnmap" host="uber" sourcetype="Uber" Host Ports | rex field=_raw max_match=50 "Host:\s(?<dest_ip>\S+)" 
| rex field=_raw max_match=50 "[Ports:|,]\s?(?<port>\d+)\/+(?<status>\w+)\/+(?<proto>\w+)\/+(?<desc>\w+|\/)"
| rex field=_raw "OS:\s(?<os>\w+)"
| eval os = if(isnull(os),"unknown",os)
| eval mv=mvzip(port, status) 
| eval mv=mvzip(mv, proto) 
| eval mv=mvzip(mv, desc) 
| mvexpand mv 
| makemv mv delim="," 
| eval ports=mvindex(mv, 0) 
| eval status=mvindex(mv, 1)
| eval proto=mvindex(mv, 2)
| eval desc=if(mvindex(mv, 3) == "/","null",mvindex(mv,3))
| table dest_ip ports status proto desc os
| sort dest_ip

chrome_2018-05-07_20-37-20
This would work pretty well for sorting out scans if you have a large set of scans to parse through. For example, you can expand the query to map all server locations with TCP port 443 open using:

source="uber.gnmap" host="uber" sourcetype="Uber" Host Ports 
| rex field=_raw max_match=50 "Host:\s(?<dest_ip>\S+)" 
| rex field=_raw max_match=50 "[Ports:|,]\s?(?<port>\d+)\/+(?<status>\w+)\/+(?<proto>\w+)\/+(?<desc>\w+|\/)"
| rex field=_raw "OS:\s(?<os>\w+)"
| eval os = if(isnull(os),"unknown",os)
| eval mv=mvzip(port, status) 
| eval mv=mvzip(mv, proto) 
| eval mv=mvzip(mv, desc) 
| mvexpand mv 
| makemv mv delim="," 
| eval ports=mvindex(mv, 0) 
| eval status=mvindex(mv, 1)
| eval proto=mvindex(mv, 2)
| eval desc=if(mvindex(mv, 3) == "/","null",mvindex(mv,3))
| table dest_ip ports status proto desc os
| sort dest_ip
| table dest_ip,ports 
| search ports=443 
| iplocation dest_ip 
| geostats count by City

chrome_2018-05-07_20-42-17

[1] Reference: How to parse Nmap in Splunk

Conclusion

Data manipulation and analytics plays a relevant role in Offensive Cyber Security Operations. Much like Password Cracking being of importance and hence why we purchase 32-GPU cracking rigs, this is yet another good to have system that could help to make your operations more streamlined and efficient.

Many people say that the GZIP'd comparison is not fair, however, agn0r sheds some light into how Splunk actually operates using a Lexicon structure.
Quote from ayn0r on Reddit:
chrome_2018-05-08_23-02-55