Mehmet Ergene

Threat Hunting and Detection Using Web Proxy Logs

Web proxy logs are a valuable source of telemetry for identifying suspicious behaviors and uncovering potential threats in your environment. These logs typically include details such as Duration, HTTP Status, Bytes In/Out, Protocol, HTTP Method, HTTP Version, URL Category, Hostname, Path, Query, MIME Type, File Name, and User-Agent.

This post outlines how each of these fields can be leveraged for threat hunting and detection use cases.

This information shows how long the transaction has taken. Malware can communicate with the C2 server over the HTTP(S) protocol. When this is the case, it asks for commands periodically. This period doesn't have to be a constant value like every 10 minutes. Malware can also use jitter to make random-looking requests. Also, keeping the connection open can also be used by malware. In any case, it needs to either ask for commands very often or keep the connection open.

Aggregate total connection duration per SourceIP-DestinationIP over a 12/24-hour period.

Higher values may indicate beaconing. Keep in mind that not all beacons are malicious. That's why we are hunting.

Note: If you apply the same method to your public websites, you can detect web scraping or customer data scraping.

Users visit websites, post something, sometimes upload some data, or download a file. In normal conditions, these transactions have an HTTP 200 result. When it comes to malware, it is possible to use HTTP error codes as a C2 channel. Also, most malware use DGA(domain generation algorithm) in order to keep the connection persistent if one of the domains is blocked. In such a case, the malware keeps getting HTTP errors and tries the next domain.

Count HTTP status codes per SourceIP or SourceIP-DestinationIP over a specific time period.
List URLs having only HTTP Errors.

Higher values of an uncommon HTTP Status Code may indicate C2 activity.
Higher values of HTTP errors for a website can indicate failed C2 activity.

In normal conditions, when a user visits a website, downloads a file, etc., each transaction is made with a different data size. On the other hand, malware visits the same page(URL) every time. This makes the downloaded content has the same size unless the attacker starts interacting with the victim machine.

Count the BytesIn per Source-Destination pair over 12/24 hours. You have the best chance when the attackers sleep as there is no interaction.
Compute the ratio of count(BytesIn) per Source-Destination pair. This is for comparing the attacker interaction versus idle status.

Higher values may indicate beaconing. C2 servers reply with the same data, making Bytes In value the same.
Higher values of ratio may indicate C2 beaconing.

A normal user activity consists mostly of downloading data. Uploaded data is usually small unless there is a file/data upload to a website.

Compute the total BytesOut per Source-Destination pair over 12/24 hours.
Compute the ratio of count(BytesOut) per Source-Destination pair over 12/24 hours.

Higher values may indicate data exfiltration.
Higher values of ratio may indicate beaconing.

In normal circumstances, a user's web traffic contains a large amount of HTTP GET, a small amount of HTTP POST methods. Other HTTP methods, such as HTTP PUT, are expected to be seen less.

Compute the ratio of the POST or PUT over GET per Source-Destination over 4/8/12/24 hours.

Higher values of ratio may indicate beaconing or exfiltration.

Usually, a user visits websites that are in the top 1M list. In some cases, an unpopular website can be visited by lots of users as well (think about 3rd parties having business with the company).

Compare with top 1M domains and calculate the visit count.
Compute the visit count per Hostname.

Hit count <5 and Hostname is not in the top 1M may indicate malicious payload delivery.
Small number of hit count may indicate malicious payload delivery.

C2 beacons usually use the same URL path for C2 communication.

Compute the count per Source-Destination-URLPath pair.

Higher values may indicate beaconing.

URL query information is seen when you search for an item on a website. Malware does the same when asking the C2 server if there is anything to run on the victim machine or sending its ID. The query can be encoded/encrypted as well.

Compute the count per Source-Destination-URLQuery.
Compute the length of URLQuery.
Look for base64 encoded strings in URLQuery.

Higher values may indicate beaconing.
Higher values may indicate encoded data, a sign of exfiltration or beaconing.
Encoded strings may indicate beaconing or exfiltration.

Unfortunately, most web proxies fail to determine the exact type of content.

List mime type per Source-Destination pair.

Uncommon mime types may indicate a malicious file.

Normally, all applications have their own user agent information. Malware can try to mimic a legitimate application user agent but sometimes fail to do that with a small typo.

Calculate count within the environment(long tail analysis).

Lower values may indicate a malicious binary existence.

In most environments, there are commonly blocked web categories like Hacking, Pornography, Dynamic DNS, etc. Uncategorized websites are a pain and sometimes this category has to be allowed for the sake of business continuity.

Query for Uncategorized, Dynamic DNS, and other suspicious categories. Compute the distinct count of SourceAddress by URLHostname.

Small dcount values may indicate abnormal/suspicious/malicious activity. If an uncategorized URL is visited by many users, it is less likely that the URL is malicious.

There are five HTTP versions — HTTP/0.9, HTTP/1.0, HTTP/1.1, HTTP/2.0, and HTTP/3.0. The current version is 1.1 and the future ones are/will be 2.0/3.0

Analyze HTTP versions.

HTTP/0.9 and HTTP/1.0 are old. This may be an indication of malicious activity.

Web proxies are able to determine the protocol by analyzing the traffic.

Compare ports with their standard protocols.

Common Protocol-Uncommon Port or Common Port-Uncommon Protocol may indicate malicious traffic.

It's not always possible to log reliably the names of the files that are downloaded from the internet. If it's logged properly, file names can be used for hunting. Some malware droppers download randomly named files.

Entropy analysis on filenames.
Lenght of the filename

High entorpy may indicate malicious payload delivery.
Short file name may indicate malicious payload delivery(1.bat, 3.exe, etc.).

Latest from our blog

Easter Sale: 20% OFF

Threat Hunting and Detection Using Web Proxy Logs

Duration

Technique

What to look for

HTTP Status

Technique

What to look for

Bytes In

Technique

What to look for

Bytes Out

Technique

What to look for

HTTP Method

Technique

What to look for

URL Hostname

Technique

What to look for

URL Path

Technique

What to look for

URL Query

Technique

What to look for

Mime(Content) Type

Technique

What to look for

User Agent

Technique

What to look for

URL Category

Technique

What to look for

HTTP Version

Technique

What to look for

Protocol

Technique

What to look for

File Name

Technique

What to look for

Conclusion

Latest from our blog

Template-2

The Hidden Gaps in Entra ID Linkable Token Identifier

Your Logs are Lying: How Network Infrastructure Impacts EDR Network Telemetry

Querying Azure Resource Graph Without Limits Using KQL

Featured Links

Connect with us

Policies

Subscribe to our Newsletter!

Fall in Love with KQL: 30% OFF!

Use VLTN30 at checkout!

New Challenge Lab