Honeyhive: IoT IDS Framework with Distributed Honeypots

With the ever increasing number of Internet-connected devices, the importance of cyber security similarly increases. Exploding over the past decade, the number of Internet of Things (IoT) devices connected to the Internet jumped from 3.8 billion in 2015 to 17.8 billion in 2018 [1]. A major concern with many IoT devices is that they contain vulnerabilities that are often left unpatched [2][3]. To make matters worse, many of these IoT devices lack modern security measures found on traditional computing devices, due their inherent hardware limitations and vendors focusing on functionality and time to market over security [2]. While an insecure IoT device connected to a consumer’s probably-already insecure home network does not cause much worry, insecure IoT devices connected to previously-secure networks do. An attacker now has a vector into a previously locked down network and can use the device as a pivot to gain access into the internal network [4]. If these devices were to become connected to Critical Infrastructure and Key Resources (CIKR) networks, the results could be catastrophic. Honeypots are devices not part of routine network usage that are meant to alert of an attacker’s presence, capture tools, and record Tactics, Techniques, and Procedures (TTPs) [5]. Honeypots come in varying levels of sophistication and are available in multitudes of frameworks. Honeyd is one such framework that is capable of rapidly creating low-interaction honeypots by simulating the network stack. Lukas Stafira
used Honeyd to develop three convincing web-based IoT honeypots which are used in this research. Stafira created IoT honeypots for the TITAThink camera, Proliphix thermostat, and an ezOutlet2 power outlet [6]. Network Intrusion Detection Systems (NIDSs) are devices that analyze network traffic and create alerts if they see traffic of malicious nature and or anomalous traffic. Intrusion detection is split into two categories, signature matching and anomaly detection. Signature matching uses known patterns of malicious traffic and creates alerts upon seeing the pattern. Anomaly detection on the other hand uses baselining and heuristics to create alerts when network traffic deviates from the network baseline. Signature matching is faster and easier to setup than anomaly detection but only alerts on traffic matching installed signatures. This means that signatures must be kept up to date and any previously unknown exploit (zero-day) will not generate an alert. Anomaly detection can detect zero-day exploits but requires much more setup and can create many false positives, or false negatives, if the heuristics are not fine-tuned. Modern NIDS often are a hybrid of the two detection techniques.

1.2 Motivation

Even networks with security measures in place are not immune to compromise; an example is the cyber attack against Ukarine in 2016 where attackers successfully gained internal network access through a phishing campaign. After initial access, attackers then conducted internal network scans and credential harvesting over a period of several months. Using gathered credentials, attackers gained access to Supervisory Control and Data Acquisition (SCADA) networks and took approximately 30 power stations offline, sending the country and more than 230,000 residents into darkness for

several hours [7]. If the network had contained convincing SCADA honeypots then it is possible that network administrators would have detected the attackers’ presence and been able to respond in time before the real SCADA systems were taken offline. While honeypots do not guarantee network security nor are they the solution
to securing every network, their use is another viable tool for network defense. Due to IoT devices’ lack of sophisticated hardware and vendor support for security updates, other methods must be implemented to secure the network. Because so many IoT devices remain unpatched, unmonitored, and left on, they have become a tantalizing target for attackers to gain network access or add another device to their botnet [8]. Due to IoT device popularity with attackers HoneyHive was developed. HoneyHive is a framework that uses distributed IoT honeypots as NIDSs sensors
that beacon back to a centralized Command and Control (C2) server. This research uses the IoT honeypots developed by Stafira, but HoneyHive is flexible enough to support any device capable of running the Python 2.7 HoneyB Agent script. Providing security for all IoT devices with their heterogeneous nature is a monumental task. HoneyHive instead offers another method for network intrusion detection using the lure of vulnerable IoT devices as distributed honeypot intrusion detection sensors. Because traditional NIDSs typically only monitor Switch Port Analyzer (SPAN) traffic from the switch they are located on, they can miss attacks located on other parts of the network. Typical placement of a NIDS is just inside a network’s external firewall in the Demilitarized Zone (DMZ) [9]. If an attacker manages to to infiltrate an internal network without tripping the NIDS, then internal attacks and or scans can be performed without raising an alert, as was the case with the Ukraine power network. The HoneyHive framework addresses this shortcoming of the traditional NIDS construct by using distributed IoT Honeypot NIDS sensors.

1.3 Research Goals

The goal of this research is to first develop the HoneyHive framework and then test its effectiveness in network intrusion detection compared to that of a traditional NIDS.
The hypotheses for this research are:

The HoneyHive framework operates correctly by not alerting on routine network
traffic and alerting on non-routine network traffic.
The HoneyHive framework detects intrusions that traditional NIDSs cannot
through the use of distributed IoT honeypot sensors and packet capture aggregation.

1.4 Approach

In order to determine HoneyHive’s effectiveness at network intrusion detection, several steps must be taken. These include the development of the HoneyHive framework, setting up the simulated network for experimentation, and then designing and running the experiment.

1.4.1 HoneyHive Framework

To develop the HoneyHive framework, Honeyd and Stafira’s IoT honeypots are first setup. Then the individual components of the framework are developed, including the C2 server, transfer server, Snort log parser, Database (DB), and HoneyB Agent script. Snort is also integrated into the framework for increased signature matching. The HoneyHive framework is explained in more depth in Chapter 3.

1.4.2 Simulated Network

A simulated network also needs to be setup in order to run the experiment. The network is composed of IoT devices, Stafira’s honeypots (duplicated several times), Windows 10 devices running Ubuntu Virtual Machines (VMs), Ubuntu VMs running the Honeyd honeypots and HoneyB Agent script, the HoneyHive C2 server, Suricata, an Ubuntu attacker machine, and networking devices. The network layout is described in Chapter 3, and the actual devices used on it and in the experiment are described in Chapter 4.

1.4.3 Experiment

After development, HoneyHive’s effectiveness at network intrusion detection is tested in a simulation where an attacker has gained access to the internal network, has narrowed down their list of targets through previous reconnaissance, and now is performing internal nmap network scans against the specific Internet Protocol (IP) addresses before launching exploits against them. The attacker launching exploits on scanned devices is not tested in this experiment. The exploitation and propagation stage would hopefully be prevented by network administrators through the use of
alerts from the HoneyHive framework. The tests in this experiment involve four types of scans and four levels of active honeypots. The scan types include No Scan (Control Group), TCP Connect scan, Aggressive scan, and NIDS Avoidance scan. The levels for honeypots are 0, 3, 6, and 9 honeypots. Each of these are run in different combinations with one another for a full factorial experiment resulting in 16 different combinations. Each test is performed 30 times for a total of 480 runs. Because of the timing and coordination required to run the experiment, gather results, and reset
devices to their initial state after each run, the runExperiment.py script automates this process. This script is discussed further in Chapter 4 and is found in Appendix

1.5 Assumptions and Limitations

1.5.1 Assumptions

The following assumptions are made in this research:

Routine network traffic on the simulated network does not contain any traffic a
NIDS would treat as malicious.
Given the same set rules, NIDS create the same number of distinct alerts and
the same number of total alerts when analyzing an identical sample of network
traffic.

1.5.2 Limitations

1.5.2.1 HoneyHive

Several limitations currently exist in the HoneyHive framework. The HoneyB Agent script is written in Python 2.7 which is near the end of its life. Additionally, HoneyHive relies on a NIDS (Snort) to perform signature matching instead of being self-contained and possessing its own sophisticated intrusion detection system.

1.5.2.2 Honeypots

While the honeypots in this experiment are useful for testing hypotheses, implementing modern and more sophisticated honeypots would improve the HoneyHive framework. Honeyd is outdated and no longer regularly maintained [10][11]. Also, Stafira’s honeypots are low-interaction and are only convincing with web traffic.

1.6 Research Contributions

The HoneyHive framework offers increased network intrusion detection to all networks its deployed to. It can be used for integration in CIKR-based networks since IoT devices share some similarities with Industrial Control System (ICS). In addition, government organizations or commercial companies that work in cyber security could integrate HoneyHive into their existing network security architecture. The impact of this framework is a cross-platform, standalone, NIDS / Network Monitoring solution capable of improving the rate at which network intrusions are detected. While HoneyHive may not be the solution for every network, it is a viable tool for increasing network security through intrusion detection.

1.7 Thesis Overview

Chapter 2 provides background information and related research on the state of IoT devices, IoT and Computer Network Security, NIDS and Network Monitoring, and honeypots and honeytokens. It also provides details about software and programming languages used in this thesis. Chapter 3 describes the HoneyHive framework design and components in depth and explains the rationale behind design decisions.Chapter 4 describes the methodology for running the experiment and the research questions posed for this thesis. The methodology includes all parameters, factors,metrics, and a step-by-step procedure to replicate the experiment. Chapter 5 presents the experiment results and provides analysis. Finally, Chapter 6 provides a summary and conclusion for this thesis as well as future work to improve the HoneyHive framework, and hopefully, IoT and Computer Network Security.

II. Background and Related Research

2.1 Overview

This chapter provides background information on IoT, IoT and Computer Network Security, NIDS, and Network Monitoring, honeypots and honeytokens in Section 2.2. It also covers Honeyd 1.5c, Cyber Deception, and programming languages and tools used in this research. Section 2.3 explores related research and emerging technologies in the field of IoT and honeypots.

2.2 Background

2.2.1 Internet of Things (IoT)

The term IoT covers a myriad of devices and appliances with capabilities to sense the world around them, process information, and share this information with other devices on an internal network or the Internet at large [12]. Simple sensors and appliances now have the computing power for making intelligent decisions, as well as communication abilities for sharing perceived data and being remotely interacted with [13]. Suo et al. break IoT device functionality into four layers: the application layer, support layer, network layer, and perceptual layer. The application layer is the
actual service displayed to the user such as a web page, application, or screen. The support layer acts as the intermediary between the application and network layers and involves cloud computing to bring increased performance. The network layer deals with transmitting data between devices through numerous different communication protocols. Finally, the perceptual layer is responsible for collecting data in the physical world and converting it to digital data through the use of sensors, cameras, Radio-Frequency Identification (RFID), Global Positioning System (GPS), transducers, thermostats, etc. [12].

The majority of IoT devices communicate on a Wireless Personal Network (WPAN) with a 10 meter range using one of several different Institute of Electrical and Electronics Engineers (IEEE) protocols. These protocols mainly include Bluetooth (IEEE 802.15.1), Ultra-Wideband (UWB) (IEEE 802.15.3), and Zigbee (IEEE 802.15.4) [13][14][15][16][17]. Some devices utilize Wireless Fidelity (Wi-Fi) (IEEE 802.11) instead for communication over a Wireless LAN (WLAN) with a range up to 100 meters. However, the devices used in this research, and that would be found as honeypots,
communicate over Ethernet (IEEE 802.3), which is the focus of this research. The added functionality of smart devices makes them very appealing and has caused the IoT market to explode over the past decade. As shown in Figure 1, the
total number of devices connected to the Internet in 2018 was 17.8 billion, 7 billion of which were IoT devices. By 2025, the total number of devices connected to the Internet is expected to grow to 34.2 billion with IoT devices comprising 21.5 billion of the total devices. The IoT market is anticipated to grow to reach $1.6 trillion by 2025, making it a lucrative and competitive market [1]. With such a competitive market, vendors are scrambling to be the first to release the latest and greatest product, often cutting corners in areas like security to reduce cost and production time. IoT devices are often slapped together with inexpensive, outdated, and insecure third party components that are no longer supported or patchable [2]. Insecure IoT devices are an alarming problem in network security for consumers, corporations, the Department of Homeland Security (DHS), and the Department of Defense (DoD).

2.2.2 IoT and Computer Network Security

IoT creates new possibilities for technologies never before imagined but also opens up new vulnerabilities and attack vectors for malicious hackers [4]. These new attack

Figure 1. Growth of IoT Devices from 2015-2025 [1]

vectors arise from a lack of security in devices. Unfortunately, some vendors in this market are not primarily concerned with the security of their devices. Their main focus is to rapidly develop innovative and easy to use technology before competitors and turn a profit. This mindset often leaves many IoT devices ripe for exploitation. Many IoT devices are riddled with vulnerabilities due to vendors focusing on cheap solutions and rapid development in a competitive market. HP performed a study and found that 70 percent of IoT devices contain vulnerabilities. When researching 10
of the most popular IoT devices, they found an average of 25 security vulnerabilities per device, with over 250 vulnerabilities in total [3]. Outdated software is loaded on devices that are then often never updated or very cumbersome to do so for the average user. This results in millions of IoT devices with known unpatched vulnerabilities
connected to the Internet, just waiting for attackers to exploit them. Attackers can quickly discover devices with known vulnerabilities using websites such as Shodan While being able to lock a house, switch on or off lights, or adjust a thermostat remotely can be desirable, the inclusion of these IoT devices opens up significant vulnerabilities in networks. A once-secure network can now be accessed by exploiting a vulnerable IoT device and using it as a pivot into the otherwise unreachable network. Because these devices are not intended to be accessed by just anyone, unlike web servers, they are not placed inside DMZs, but are instead placed deeper within the network. Attackers are still able to reach these vulnerable IoT devices if they first compromise a DMZ or a different internal device and utilize that device as a pivot to the IoT device. Traffic is already allowed to DMZ devices, but a misconfigured router or router with port forwarding can allow internal devices to be compromised. The threat of IoT devices being hacked is not just theoretical; in 2014, smart meters were hacked allowing attackers to spoof messages between nodes. With spoofed
messages attackers could avoid paying their monthly utility bill or shut down energy from the utility company altogether, without the use of any kinetic effects. The attacker’s ability to shut down energy demonstrates the threat of cyber to CIKR networks; they are susceptible and can be disabled as well [21]. Many IoT devices have been found that allow logins with empty, default, or weak passwords. IoT devices are becoming increasingly common targets for use in botnets. While IoT devices may have limited computational power, their sheer number and ease of exploitation have made them an enticing target for attackers. In addition, these devices are not often updated and have limited user interaction allowing attackers to go unnoticed on a network for a prolonged period of time [22]. Take the
Mirai (Japanese for “the future”) worm for example, hundreds of thousands of IoT devices have been compromised and assimilated as part of botnets since its release in 2016 Mutations of the Mirai worm are even prevalent today because of the lack of

security implemented in IoT devices [8]. These botnets, consisting of up to 400,000
devices, are available for purchase and have been used to execute a Distributed Denial
of Service (DDOS) attack on a number of web servers successfully [23].
Because IoT devices lack sophisticated hardware and are so diverse, traditional
methods for securing them like installing antivirus software or automatic updates
are not possible typically. Kolias et al. argue that the vendor is responsible for
implementing automatic updates and better security in device [8]. IoT devices rarely
receive updates to fix vulnerabilities, and on the off chance they do, there is even a
smaller percentage of users that take the time to manually install the updates [2].
The average user plugs the IoT device into their network without changing default
passwords, and never manually checks or installs updates [20].

2.2.3 Network Intrusion Detection System (NIDS)

One common device to increase computer network security is a NIDS. A NIDS
can be deployed on networks to detect malicious traffic and intrusions. They can
be placed either inline, which can affect network latency as all traffic now passes
through the NIDS and is then forwarded to its destination, or mirrored where copies
of all traffic are sent to the NIDS as well as the original destination. NIDS can
use multiple techniques for intrusion detection which include signature / pattern
matching, and or baselining / anomaly detection. Signature-based detection searches
network traffic for patterns defined in rule-sets and creates an alert if there is a match.
Baselining involves taking a snapshot of normal traffic on a network, and then using
heuristics; any behavior that is abnormal (an anomaly), generates an alert. Modern
NIDS employ a combination of the techniques as they both have advantages and
disadvantages. Signature-based detection is great at alerting on known exploits, but is
unable to alert on zero-day exploits, which results in false negatives. Baselining on the

other hand requires creating a network traffic standard that if deviated from causes an
alert. If the network changes or routine traffic changes, then a new baseline has to be
performed. Anomaly detection can create numerous false alerts (false positives) and
require significant setup time for learning the network baseline. However, anomaly
detection is capable of detecting previously unknown vulnerabilities.
This research proposes the use of honeypots as NIDS. Using honeypots as a NIDS
is not a novel idea. Spitzner argued that they make more effective NIDS than traditional ones since they reduce false positives because any traffic sent to them is
suspicious [24].

2.2.4 Networking Monitoring

Network monitoring software works closely with intrusion detection sensors but
is focused on overall network traffic patterns and determining whether or not devices
are reachable. While NIDS inspect the content of traffic, network monitors record
the volume and types of traffic on the network. Network monitoring quickly helps
network operators identify overloaded network links and devices. Network monitoring
is useful for bringing devices back online and can detect spikes in traffic, indicative of
a Denial of Service (DOS) attack. Furthermore, if a device does go down, it can signal
that an attacker launched an exploit that resulted in a crashed service. NIDS and
network monitoring used together provide a better picture of network health while
still closely investigating traffic for malintent. The HoneyHive framework developed
in this research provides both network monitoring and network intrusion detection
for networks it is deployed to.

2.2.5 Honeypots

One way to mitigate or detect exploited devices is the use of honeypots. Honeypots
are used to increase the security of computer networks by emulating real devices
attackers might be interested in compromising. A honeypot being interacted with can
be one of the first signs of compromise in a network or of an impending attack, and
can therefore act as a NIDS. By using honeypots, previously unknown vulnerabilities
(zero-day vulnerabilities) may be discovered when an adversary targets and gains
access to the device. In addition, honeypots leave known vulnerabilities unpatched
so that TTPs of an adversary can be learned and or later used to fingerprint an actor
that employs the specific TTPs.
Honeypots come in all kinds of shapes, sizes, and implementations. They range
from simple scripts, virtual devices, to physical devices and support low to highinteraction.
Low-Interaction Honeypots Low-Interaction honeypots can simulate common network services and the network stack. However, upon receiving a known exploit, the attacker does not receive full control of the device because the command
terminal spawned is simulated. This also means that zero-day exploits are not captured since they are outside what the honeypot knows how to react to. Because the
attacker cannot gain full control of the honeypot, this does make them safer for deployment in a network, but at the expense of being easier to detect as a honeypot by
attackers.
High-Interaction Honeypots High-Interaction honeypots in contrast do
not emulate network services or the network stack but do allow the attacker to gain
full control of the device. This not only allows for a more believable honeypot, but
also supports gathering more information about the attack such as zero-day exploits,

tools and TTPs used by the hacker. Although there are many benefits to highinteraction honeypots, they also have disadvantages. While the honeypot looks more
convincing to the attacker because of the full control allowed, it now presents an
increased security risk to the network. In addition, high-interaction honeypots are
costly to develop in both time and resources; they require more maintenance and
oversight than low-interaction honeypots [24][25][26][27][28].

2.2.6 Honeytokens

Cymmetria breaks honeytokens into several sub categories: breadcrumbs, “beacons”, and tokens [29]. Breadcrumbs are data left intentionally for a hacker to find
and use to allow them to move throughout the network. However, by using breadcrumbs an attacker only moves through a controlled path of devices monitored by
network defenders. All the while, an attacker’s TTPs, which include commands,
tools, and exploits are being recorded. Cymmetria’s “beacons” create alerts whenever they are interacted with. They are not part of routine usage (like honeypots)
by the organization so any interaction is considered malicious and can even help to
identify insider threats. Examples of “beacons” include decoy shares, documents with
embedded macros, and websites that all beacon back to a C2 server when touched.
The “beacons” defined by Cymmetria are essentially various types of intrusion detection sensors. Cymmetria’s last category, tokens, are Honeydocs (fake documents)
that act as a beacon to alert that a file was exfiltrated out of the network. The main
difference between beacons and tokens, as defined by Cymmetria, is that “beacons”
reside on the organization’s internal network and tokens are meant to detect data
leaving the network [30].

2.2.7 Honeyd 1.5c

One common framework to create virtual honeypots is Honeyd. The version of
Honeyd used and described is 1.5c and was last updated in May of 2007. Honeyd
1.6d is available on Github with a last commit of December 2013, but as noted by
Stafira, contains program stability issues [6].
Honeyd simulates the network stack to allow one physical device to act as numerous honeypots. All traffic for the honeypots is sent to Honeyd which makes it look
like the devices are running independently on separate IPs. Honeypots can also be
customized by using Nmap DB files to deceive scanning and fingerprinting software.
The Nmap DB file defines how different Operating Systems (OSs) and their respective versions respond to messages as well as ports and services that are running by
default [25][31]. While not identical, the Nmap DB file can be used to closely match
network fingerprints of IoT devices. One deficiency Stafira noted about Honeyd was
the outdated Nmap DB files [6].
Within the Honeyd configuration file, low-interaction honeypots can be quickly
created. Each honeypot can be assigned a personality defined by the Nmap DB
file, customize ports to open, filter, and close, and run custom shell scripts on open
ports. Running customized scripts is one of the selling features of Honeyd; with
sophisticated enough scripts, entire services can be mimicked. In theory, creating
scripts to match every service would yield a convincing and very interactive honeypot,
but the operating system itself, as well as Inter-Process Communication (IPC) would
be painstakingly time consuming and better alternatives such as VMs exist for highinteraction honeypots. Honeyd is designed and better suited for quickly creating
numerous honeypots and simulating a handful of services to provide a low-interaction
honeypot framework.
The IP and Media Access Control (MAC) addresses of the honeypot are also

configurable, but the IP must be on the same network as Honeyd. MAC addresses
can be used to identify the type of device and manufacturer, so allowing customization
leads to more convincing honeypots. However, with these customization options, it
is imperative that both the IP and MAC addresses be unique on a network in order
to prevent collisions [31].
The Honeyd documentation states that with the flag “l” it logs packets and connections to a specified file. However, this logging option contains only time stamps,
IP addresses, ports, protocols, and transmission byte counts; the actual packet contents captured by Honeyd are not included. While advanced methods did exist to
receive the contents of the packet capture from Honeyd, none are viable now due to
the out-of-date libraries and compilation errors. Furthermore, Provos stated that he
expected a NIDS or other scripts to be run in tandem with Honeyd [31]. Implementing a full packet capture whose contents are accessible is one of the deficiencies this
research plans to address in the Honeyd framework.
Honeyd can be compiled with internal Python services that allow interacting with
Honeyd through either honeydctl (Honeyd Control) or Python scripts while it is still
running. This allows for the creation of dynamic honeypots. Honeydctl connects to
Honeyd and presents the user with a console for issuing commands. These commands
allow listing running honeypots, modifying, or deleting them. One noteworthy command is “!” which allows sending Python commands directly to Honeyd. By simply
importing the honeyd module in honeydctl (after issuing !) or a Python script, the
user now has access to all the data received and transmitted by Honeyd. Having
access to this data would allow creating packet captures and signatures for received
data. Unfortunately, Honeyd has compilation issues because of out-of-date library
dependencies and the version of Honeyd 1.5c that can be installed from the Ubuntu
packages list is not built with internal Python services. Therefore, this research per

forms its own packet capture for signature creation and forensic analysis [31].

Honeydstats and Honeyview Honeydstats and Honeyview are plugins
that allow analyzing the log that Honeyd generates from received traffic. Honeydstats
is a text-based representation of packet level data received (very similar to Honeyd’s
’l’ option), while Honeyview is a web-based GUI representation. Both Honeydstats
and Honeyview focus on the OS versions, destination ports, country codes, and IP
addresses from attackers [31]. Although statistics can be aggregated from across the
network, they both still rely on log files. The information they provide is not in depth
enough for forensic analysis nor fast enough for today’s cyber attacks. This research
hopes to provide a framework with real time alerts while capturing detailed evidence
for forensic analysis.

2.2.8 Cyber Deception

In their 2017 Cyberthreat Defense Report, the CyberEdge Group recommended
that cyber deception technology should include coverage for IoT devices [32]. While
not IoT specific, Cymmetria is leading the way on Deception Campaigns. They define
cyber deception as “baiting, studying, investigating, fingerprinting, and/or smoking
out” attackers. Through the use of cyber deception, organizations can prevent attackers from moving freely throughout their network and impose an increased cost
to attack the defended network. Cymmetria explains that cyber deception is more
than just implementing everything honey (honeypots, honeynets, and honeytokens),
as traditional honey technology is difficult to integrate into a network, expensive to
develop and maintain, and easy for attackers to detect. It is the creation, management, and monitoring of these false devices in order to manipulate attackers through
a predefined and monitored path of the network (or as Cymmetria defines, “orchestration and virtualization”). Even if the attacker realizes there are fake devices and

documents, the speed of their attack and propagation is hindered because now they
have to spend extra time verifying if a target is real or a honeypot. All the while,
the attacker’s tools and exploits are at risk of being captured. If they are captured,
then a signature or patch can be created and propagated throughout the networks,
or even worse for the attacker, reported to vendors and antiviruses and distributed
throughout the world. This renders the attacker’s tools worthless and requires them
to spend more time and money to create new ones. While traditional honeypots could
still pose this threat to attackers, alerts and all collected information on the attacker
are not available in real time [30].

2.2.9 Programming and Languages

JavaScript JavaScript is one of the staples for web development today. It is
an interpreted language and allows interacting and modifying the HyperText Markup
Language (HTML) Document Object Model (DOM), which enables creating dynamic
web pages without requiring a user to reload it. Through the use of Node.js and
Electron, applications such as HoneyHive can be created with dynamic Graphical
User Interfaces (GUIs) [33].

Node.js Node.js is a standalone runtime environment for JavaScript built
and maintained by Google. It utilizes Chrome’s V8 engine and possess the functionality to interact with the OS that JavaScript normally does not have since it is
sandboxed in the browser. This extra functionality ranges from interacting with local
files to networking modules for creating a full fledged server. The OS can be queried
for information and concurrency can be implemented through the use of child processes. These are just a handful of built in modules, but using the Node.js Package
Manager (NPM), modules can quickly be downloaded and installed for use in an application. Node.js is so versatile because it runs exactly the same across all platforms

Electron Electron is a module for Node.js that allows creating cross-platform
GUIs. It utilizes HTML, Cascading Style Sheets (CSS), and JavaScript to render the
GUI and is essentially the same as coding a standalone web page for the application. This makes creating a GUI that works on any platform relatively fast and easy
because of all the CSS frameworks available. Many applications have already been
written in Electron, such as Atom, Visual Studio Code, Discord, and Slack, to name
a few. Electron applications can even be bundled into executable files for ease of
distribution using the Electron Packager module [35].

Python Python is an interpreted language and currently supports two versions 2.7.X and 3.7.X [36]. The Python code written in this research uses 2.7.X
because of the greater community support for Python modules available and because
Lukas Stafira, whose work is built upon in this study, created his IoT honeypots
with version 2.7.X. Modules can be quickly installed in Python by using its package
manager, pip [37].

2.2.10 Tools

This section covers the essential tools used in this research for testing and results.
These tools include Nmap, packet capturing software (Wireshark and TCPDump) and
VMware Workstation. Docker is also discussed because several related researchers
utilize it, and future work on this research references it.

Nmap Nmap is an open source port scanning tool used by attackers and
security professionals alike for reconnaissance and vulnerability analysis on networks
and their devices [38]. It provides information on a device’s ports, the services running

on the ports, and the suspected OS. All gathered information from ports, running
services, and the Time to Live (TTL) in packet responses are compared against the
Nmap DB to make a best guess about the target’s OS. Nmap scans are customizable
in the type of host discovery performed, to the way it scans ports for services, all
the way to firewall and NIDS evasion. Using firewall and evasion flags, an attacker
is able to spoof their MAC, port, checksum, TTL, and even modify their Maximum
Transmission Unit (MTU), which results in smaller fragmented packets. The timing
and performance flags allow adjusting timeouts and data transmission for faster or
slower scans. If used in combination, these flags can result in scanning a network over
a long period of time, without ever alerting a firweall or NIDS [38]. Varying the scan
types and parameters of the scan and testing how effectively and quickly a scan is
detected is one measure of performance for the HoneyHive framework.

Wireshark and TCPDump Packet capturing software is used to sniff traffic on a network and inspect it in further detail later. Both Wireshark [39] and TCPDump [40] utilize the libpcap library to capture network traffic, but Wireshark is
GUI-based, while TCPDump is Command Line Interface (CLI)-based. Filters can
be used to eliminate unwanted traffic either before or after capture to narrow in on
specific hosts or protocols. Although both Wireshark and TCPDump allow quick
filtering of data by specifying source / destination hosts and ports, TCPDump allows
filtering on specific bytes in frames, packets, datagrams, and applications using the
Berkeley Packet Filter (BPF) syntax [41]. Although Wireshark has a command line
equivalent (tshark), TCPDump is used to collect traffic in finer granularity and then
analyzed in Wireshark for this research [39][40][42].

VMware Workstation VMware Workstation is a hypervisor software solution that allows creating and running VMs [43]. Virtual Machines contain a separate

emulated CPU, OS, memory, and disk space. Hardware from the host system is shared
between it and all running VMs and the amount of resources allocated to a VM is
highly configurable, see Figure 2. Users can select the amount of memory and disk
space a VM has access to, as well as the number of processor cores and peripherals
it can use. Not only does VMware allow running multiple OSs on a single computer
without having to reboot, as is the case with multi-booting, but it also provides a
more secure, sandboxed environment to run applications. If an application becomes
compromised in a VM, only that VM’s OS is affected; other VMs as well as the host
OS are not affected, see Figure 3. An attacker would have to break out of the VM
OS, and VMware hypervisor to get at the host OS, a lengthier and more complicated
process than escaping docker containers [43] [44].

Figure 2. VMware Workstation Hardware Settings

Docker Docker is a program that allows running container images. A container image is packaged software with all the dependencies included, i.e., libraries,
code, and other tools needed for execution. Because the containerized software includes all dependencies, execution is the same across different infrastructures. This
greatly increases the portability and stability of software across different devices. At
runtime, container images become containers, which results in isolating the running
software from other processes and giving it only user-level privileges. This improves
the security of the applications because now if the application is compromised by an
attacker, the attacker is limited to access of that container only, essentially a sandbox.
Whereas VMs run a guest operating system on top of the hypervisor to sandbox each
application, the docker engine runs right on the host operating system. With the
elimination of the guest operating system layer from Figure 3, containers utilize less
resources (memory, disk, CPU), which allows running more containerized applications
than virtual machines, as shown in Figure 4 [44].
Docker claims to increase application and device security, however, Sever and
Kiˇsasondi demonstrate that if container images are misconfigured then attackers can
compromise other containers, possibly escape the container and compromise the host
operating system [45]. While there are configurations and security measures that can
be put into place to prevent this, the prepared Dockerfiles and many GitHub images
do not use them [45]. Any system that is improperly configured becomes susceptible
to exploitation and therefore users of Docker should not assume their systems are
secure just because they are running containers. In fact, there are known exploits to
escape docker containers as listed in the Exploit Database website [46]. Users should
properly configure their containers with security in mind, and then lock down the
security of the host operating system as well

Figure 3. Virtual Machine Structure [44]

2.3 Related Research

Many different frameworks for building honeypots have been developed and are
explored in this section. They range from generic honeypots, ICS / SCADA and IoT
honeypots. There are numerous honeypots that serve one purpose such as a specific
exploit or service, but the focus of this section is honeypot frameworks that allow the
creation of many convincing IoT honeypots.

2.3.1 Conpot

Conpot is developed and maintained by the Honeynet Project and is used to create
ICS honeypots. Because IoT devices are used to control things like thermostats,
electrical components, and appliances, they bare an ever-increasing resemblance to
ICS. Conpot provides a suite of protocols found on ICS networks and throttles their
responses to mimic real system response time [47].

2.3.2 IoT Web-Based Honeypots by Lukas Stafira

Using the Honeyd framework and Python, Stafira emulated the web services for
three IoT devices to create realistic and interactive web-based honeypots [6]. These
devices included the TITAThink Camera, Proliphix Thermostat, and ezOutlet2 Power
Outlet. In order to make the devices appear dynamic, Stafira accessed local data,
such as time and weather, and used them to generate web pages when responding to
HyperText Transfer Protocol (HTTP) requests. Stafira tested whether Honeyd could
be used to create near duplicate honeypots that simulate the web traffic of several IoT
devices. The honeypots Stafira created successfully mimicked the HTML data of web
transmissions for the real devices. He also tested the Transmission Control Protocol
(TCP)/IP and HTML header similarity, response time, and Nmap completion time
for SYN, UDP, and FIN scans. Stafira compared his results to the physical IoT devices

using Wireshark, Nmap, and custom Python scripts. His test network configuration
is shown in Figure 5. Honeyd is shown being able to run all three IoT honeypots on
a single VM. Overall, Stafira’s results showed that it is possible to create convincing
IoT honeypots, and the honeypots he created are used in this research as convincing
IoT sensors for network intrusion detection [6].

Figure 5. Stafira’s Network Configuration [6]

2.3.3 Honeycomb by Christian Kreibich

Honeycomb is a tool for automatic signature generation from malicious network
traffic captured with honeypots, specifically those part of the Honeyd framework [48].
Kreibich treats all traffic captured by honeypots as malicious because interaction with
them is suspicious and not routine. The signatures generated are formatted for both
the Zeek (formerly Bro) and the Snort NIDS [49] [50]. Honeycomb hooks into Honeyd
and keeps track of network connections (IP and port combinations), while filtering out
traffic received from being scanned, and generates signatures using the Longest Common Substring (LCS) algorithm. Using Honeycomb, Kreibich successfully generated
signatures for both the Slammer Worm and the CodeRed II Worm [48].

2.3.4 Honeyd Syslog Solutions

Kiwi Syslog Server Kloet demonstrated how using Kiwi, it is possible to
filter Syslog messages generated by Honeyd. The Syslog messages were sent from the
host machine running Honeyd to the machine running the Kiwi NIDS [51]. The Kiwi
program then filtered Syslog messages and generated Simple Mail Transfer Protocol
(SMTP) email alerts based on predefined rules, such as a connection being established
to a honeypot. Kloet also mentioned remedies for false positives which include fine
tuning the Kiwi alert threshold, creating a static route to null, and excluding the
address that the Honeyd daemon listens on [51]. Kloet’s solution of forwarding the
Honeyd generated Syslog to a program more capable of parsing and displaying the
alerts in a readable format is useful for small networks. However, in larger networks,
network administrators could easily be flooded by emails, whether the emails are
actual alerts or false positives, and it could be difficult to piece together and visualize
what is happening. While Kiwi can filter out the noise of false positives, it has
no graphical overview of the network for easy real time interpretation by network

Honeycomb by Lavenya and Kaur Honeycomb by Lavenya and Kaur is
a Honeypot log management tool. It gathers all log files generated by the Honeyd
framework, emails them in one file for download, and then allows importing and
inspecting them in the web-based GUI [53]. Much like the Kiwi syslog server solution,
it is a step towards making the Honeyd log files more manageable and collection of
them automated. However, the alerts are still not conveyed to network operators fast
enough and the data in the log files is not in-depth enough to capture exploits, tools,
or an attacker’s TTPs [53].

2.3.5 IoTCandyJar

Ramirez et al. discuss the need for their framework since building custom IoT
honeypots or buying the actual physical device to create honeypots are too costly
[54]. The vast heterogeneity of IoT devices makes creating custom IoT honeypots
time consuming and they often are not high-functioning enough. To combat this, their
framework uses machine learning to replicate the behavior of IoT devices, dynamically
creating realistic honeypots, and presenting them as convincing devices to attackers
[54]. Figure 6 displays the IoTCandyJar framework. This framework consists of three
dynamic honeypots that attackers interact with (left), a DB that records responses
from scanned IoT devices on the Internet (middle), the IoTScanner which conducts
the scanning of IoT devices on the Internet (top right), and the IoTLearner which
uses heuristics and training to predict correct responses to attackers (bottom right).
Requests from attackers are first sent to IoTCandyJar’s dynamic honeypots. These
honeypots then query the DB for responses that could be correct. The result of
the query is in-turn passed to the IoTLearner which uses heuristics to select what it
believes to be the correct response. This response is finally forwarded to the attacker.

The IoTScanner constantly adds to the DB by using attacker requests to scan IoT
devices on the Internet [54].
While the framework can quickly imitate any IoT device connected to the Internet,
the methodology cannot precisely match responses for exploits without sending actual
IoT devices the exploits, which is illegal. IoTCandyJar does use some extent of exploit
filtering, but this only works for known exploits. For the known exploits, they either
have to manually create the response or drop the connection altogether which means
they are back to creating custom low-interaction honeypots. While for unknown
exploits, their own system may very well become an attacker itself. In addition, IoT
devices have specific ports open and services running on them, which would preclude
a single device from responding to a Nmap scan with the exact IoT profile the attacker
is targeting [54].

2.3.6 HoneyLab

HoneyLab is a distributed framework for deploying and sharing honeypots between
cyber-security researchers that seeks to address the shortcomings Chin et al. described
as infrastructure fragmentation, flexibility for deploying devices, and the limited IP
address space [55]. HoneyLab runs honeypots in a virtualized environment for high
level interaction and attack containment. The framework, shown in Figure 7, is
composed of a web interface to register / login and control honeypots, the C2 node
called HoneyLab Central, and sensor nodes distributed worldwide that run on Xen
servers. The Xen servers deploy honeypot VMs alongside VMs with sensing software.
Users can upload custom VM images which allows for maximum flexibility and custom
honeypot support. Commands can be issued to honeypots through HoneyLab’s web
interface but users also have the option of interacting with their honeypots through
Virtual Network Computing (VNC) or a remote shell after establishing a Virtual

Private Network (VPN) connection to the network. All sensor nodes run the Honeylab
daemon software to communicate with the C2 HoneyLab Central device to report
alerts and receive commands [55].
Limitations with the framework include IP-only level traffic (no Ethernet), all traffic must go through the HoneyLab Central device which could become overburdened,
and all outgoing connections (reverse connections) are blocked. These limitations
affect the convincingness of the actual honeypots and may not fool attackers. Also,
once an attacker is in the target’s internal network, they can see that the honeypot
is not part of the network and all traffic is forwarded to it. Additionally, it is not
apparent how propagation throughout the HoneyLab honeynet from a compromised
honeypot is prevented. Finally, the research appears to be discontinued because the
website was not found to be up and operational [55]. Like HoneyLab, the IoT honeypot sensors in this research beacon back to a central command and control server
for real time alerts.

2.3.7 SIPHON

Like HoneyLab, SIPHON is a globally distributed honeynet intended to be a
“Scalable, high-Interaction Physical HONeypot” framework [28]. As shown in Figure
8, the framework uses IP addresses distributed around the world from servers rented
from cloud providers (Amazon, Digital Ocean, and Linode) that act as “wormholes”
– interconnecting the honeynet through SSH tunnels. By using this design, certain
geographically located devices that are more desirable to attackers can be simulated.
As Figure 9 illustrates, the wormholes send the attacker’s traffic to SIPHON’s “forwarder” devices that change IP address and perform man-in-the-middle attacks before
finally sending the traffic to actual physical IoT devices. This setup is very similar to
IoTCandyJar’s method of sending traffic to physical devices, but instead of merely
recording the devices’ responses for replay, SIPHON’s network owns the IoT devices
and, can, therefore, allow high-interaction and record advanced attacker methodology
[28].

2.3.8 HoneyIo4

HoneyIo4 by Alejandro Guerra Manzanares is a low-interaction honeypot with
four Python scripts to match the expected Nmap DB scan responses for the following
IoT devices: GoPro Hero3 camera, Casio QT6600 cash register, Nintendo Wii video
game console, and Oki B4545 printer [23]. HoneyIo4 also includes a web-based GUI
that allows starting or stopping each honeypot by simply executing the associated
Python script. While HoneyIo4 successfully matched target Nmap DB OS profiles, it
appears to be trying to re-invent the wheel as the Python scripts attempt to do what
Honeyd does already. Honeyd allows for quickly customizing ports, responses, and
specifying an OS profile from a Nmap DB for response traffic to match. Manzanares
claims that Honeyd cannot match IoT OS fingerprints, but as long as the OS is in

Figure 9. Attacker’s Interaction with SIPHON [28]

the supplied Nmap DB file to Honeyd, the traffic can be matched.
Honeyd also has more advanced capabilities than HoneyIo4 such as running multiple honeypots at once on the same physical device, routing of network traffic, and
keeping state for each honeypot [23].

2.3.9 IoTPOT and IoTBOX

IoTPOT and IoTBOX is a two-part honeypot system consisting of a Telnet service
“frontend” (IoTPOT) and sandboxed “backend” (IoTBOX) [22]. IoTPOT changes its
responses to match different IoT devices that an attacker is targeting based on their
initial Telnet requests as illustrated in Figure 10. By using this method, IoTPOT can
appear to be a vast number of different IoT devices. IoTPOT also logs all traffic which
includes login attempts and credentials. Login settings can also be customized to allow
authentication on the first attempt, a specific username and password combination,
or authenticate only after a set number of attempts. After an attacker successfully
authenticates, IoTPOT checks if the command issued has a known, stored response.
If it is a known command, IoTPOT responds to the attacker directly. If the command
is not known, IoTPOT forwards the command to IoTBOX, stores IoTBOX’s response
so it can quickly respond to the same command in the future, and then forwards it
to the attacker [22].
The design of IoTBOX is shown in Figure 11. Because some commands can be to
download malware, IoTBOX is ran in a controlled environment with frequent image
resets. IotBOX uses QEMU to emulate eight different Central Processing Unit (CPU)
architectures which are then run on the OS OpenWRT. The benefit of this is that
malware executables are compiled to run on a specific CPU architecture, and through
CPU emulation, the captured malware can be run and analyzed in depth [22].
It is not apparent how IoTPOT would know the correct banner response an at

tacker is looking for from a specific IoT device. Also, because IoTPOT uses one IP
address, an attacker or scanning tool that documents device analysis would notice
this single IP responds like multiple different devices and may become suspicious of
it being a honeypot.

2.3.10 Multi-Purpose IoT Honeypot

Inspired by IoTPOT, Krishnaprasad created the “Multi-Purpose IoT Honeypot”
to handle four protocols commonly used by IoT devices: Secure Shell (SSH), Telnet,
HTTP, and CPE WAN Management Protocol (CWMP) [56]. Multi-Purpose IoT
Honeypot utilizes a “frontend” proxy that is running a Python script for each of the
supported protocols. The frontend logs data about the attack and then forwards it
to the corresponding service “backend” which are only two docker machines running
the services. While Multi-Purpose IoT Honeypot is running common services for IoT
devices, it does not tailor its responses to deceive Nmap scans performed by attackers
that it is in fact an IoT device and not a honeypot. Furthermore, if an attacker does
connect to a service, they realize it is not an IoT device and not connected to any
real network. Krishnaprasad’s use of docker to containerize honeypot machines is
a concept also used by Cymmetria’s Honeycomb framework. This technique allows
for high-interaction honeypots that are easier to develop, deploy, and maintain than
physical devices or traditional virtual machines [56].

2.3.11 ThingPot

Like IoTPOT and Multi-Purpose IoT Honeypot, ThingPot also has a frontend and
backend design [57]. ThingPot classifies itself as a “Medium Interaction Honeypot”
simulating Extensible Messaging and Presence Protocol (XMPP) and Message Queue
Telemetry Transport (MQTT) and low interaction for HTTP REST traffic. Each of

these services is also run in a virtual environment using docker. An overview of
ThingPot’s design is shown in Figure 12. The XMPP and REST nodes implement
that respective protocol while the controller node logs and stores data. Using this
design, ThingPot imitated a Phillips Hue smart light and had an actual attacker try
and take control of it [57].

2.3.12 IoTSec

One proposed solution for securing IoT devices is the use of interceding devices
called µmboxes as an intermediary between IoT devices, that dynamically configure
firewall rules to allow for the specific traffic of IoT devices on the network, essentially
acting as a personal firewall or blue coat proxy for the IoT devices [20]. µmboxes work
together and alert the centralized IoTSec Control Platform (C2) if an intrusion or
anomaly is detected. They utilize several different methods for detecting intrusions:
signature matching, network baseline generation, and cross-device policies. Signature
matching and network baselining are not new concepts, but cross-device policies are
interesting because using the functionality of other IoT devices, a safety check can
be performed. The example Yu et al. give is an IoT camera checking that a person
is home before a smart oven is allowed to be issued the “on” command [20].

2.3.13 Honeycomb and MazeRunner by Cymmetria

Cymmetria is a cyber-security solutions company based out of Tel Aviv, Israel
and was founded in 2014 by Gadi Evron [58]. The Chief Executive Officer (CEO),
Gadi Evron, has over 15 years of experience in cyber security and was the former vice
president of cyber security strategy at Kaspersky Lab. Cymmetria’s flagship product
is MazeRunner which utilizes their Honeycomb framework [59].
Honeycomb by Cymmetria, not to be confused with Honeycomb by Kreibich (a

plugin for Honeyd), or Honeycomb by Lavenya and Kaur (a Honeyd log manager),
allows for rapid and customized honeypot creation using containers and Python plugins. Honeypots are spawned off through the use of these containers [60].
MazeRunner allows users to create honeypots, add services, and change configurations all rapidly and with a GUI. MazeRunner is the container that manages all
the honeypots and acts as the command and control for intrusion detection. It provides an overview of real time alerts of interaction with the honeypots. It implements
packet capture, memory dump, shows what commands an attacker ran and allows
downloading the tools an attacker used. The tool’s hash can then be propagated as a
signature to flag on throughout the network. Custom scripts such as Stafira’s can be
run with the Enterprise edition for even more customized honeypots. The backend
uses their Honeycomb framework [61] [62] [63].
MazeRunner contains a component called ActiveSOC which automatically investigates an incident using rules and heuristics to determine if the incident needs further
investigation by an analyst. Using ActiveSOC, false positives can be reduced and
analysts can focus on investigating actual intrusions [64].
MazeRunner has been successful in catching red teams in NATO exercises [29] as
well as APTs such as APT3 (pirpi – a Chinese Threat Actor) in European government networks, defense contractor networks, and several other customer’s networks
[65]. Additionally, MazeRunner successfully captured the tools and TTPs of the
cyberespionage group Patchwork, as Patchwork moved throughout the MazeRunner
network. Patchwork, aptly named from the copy-paste code used from online forums,
is a targeted attack against government agencies and has infected several thousand
machines since 2015 [66].

2.3.14 Comparison of Related Frameworks

This research builds off of the IoT honeypots created by Lukas Stafira. The closest research to the HoneyHive framework is MazeRunner by Cymmetria. However,
MazeRunner does not use nor allow creating custom IoT honeypots as network intrusion detection sensors. Table 1 provides a summary and comparison of all the
frameworks mentioned. The specific categories compared include the honeypot level
of interaction (Honeypot Level), whether or not the framework focuses on IoT honeypots (IoT), whether or not the framework implements full packet capture (PCAP),
whether or not the framework is distributed (Distributed), whether or not the framework was developed with the intent for it to be used as a NIDS (NIDS), whether or
not the framework reports alerts and receives commands from a C2 server (C2), and
the year of the last update on the framework (Last Update). The various levels of
honeypot interaction include low, medium, and high. An “X” in a category denotes
the framework possess that trait. Neither Conpot nor Honeyd were made to create
IoT honeypots specifically, perform full packet capture, have a distributed framework,
be implemented as a NIDs, or have a Command and Control structure.

Table 1. Comparison of Honeypot Frameworks adapted from [19]

2.4 Chapter Summary

This chapter discusses the exponential growth of IoT devices and the need for
improved IoT and computer network security. Several methods for improving network security include the use of a NIDS, network monitoring, and honeypots. The
HoneyHive framework utilizes a combination of all these aspects for increased network security. Languages and tools for the development of the HoneyHive framework
are also covered in detail. Finally, related research in the field of IoT honeypots is
explored to understand existing solutions, their shortcomings, and their inspiration
in the development of this research.

Source: Honeyhive – A Network Intrusion Detection System Framework Utilizing Distributed Internet of Things Honeypot Sensors

Honeyhive – A Network Intrusion Detection System Framework Utilizing Distributed Internet of Things Honeypot Sensors

I. Introduction

1.1 Background