6lab.cz | RSS Feed

Design of Data Retention System in IPv6 network

Martin Zadnik — Sun, 11 Dec 2011 18:22:29 +0000

1 Introduction

Data retention system generally allows a network operator or law enforcement agencies to track malevolent users or security incidents. The network data retention system, especially in large networks, has to be able to handle a significant volume of data. Storing the network traffic itself is not a feasible option due to limited storage capacity and speed. Therefore the system is usually based on some form of flow monitoring – e.g. NetFlow. NetFlow data provides necessary information for data retention – source, destination, duration and type of communication together with amount of transferred data. However, are NetFlow data sufficient for the data retention system, if IPv6 protocol is deployed in the network?

The IPv6 protocol creates new challenges. Unlike IPv4, an IPv6 address no longer identifies a user or PC uniquely, because an IPv6 address can be randomly generated and keeps changing. This document discusses general specification of data retention system according to ETSI TS 102 657 [2] document and address the major monitoring issues of IPv6 connectivity. A practical solution for monitoring both IPv4 and IPv6 traffic is proposed. The proposed data retention system is able to monitor and identify a user in both IPv4 and IPv6 traffic. The solution requires an extension of the monitoring data collected from network devices. A new data structure based on extension of NetFlow records is presented. Data retention system is deployed at the Brno University of Technology (BUT) campus network. The document discusses the results of data retention traffic monitoring and future challenges.

2 IPv6 and user identification

Many internal resources require the ability to track the end user’s use of services. IPv6 address tracking (or data retention) is also a legal obligation of ISPs required by governments. If a local security policy requires better control, either fixed IPv6 addresses must be centrally assigned and logged, or stateless configuration using DHCPv6 has to be deployed. If stateless auto-configuration is deployed, a new monitoring system is required.

Temporary IPv6 Addresses: Auto-configuration is a new IPv6 feature that allows a node to automatically generate an IPv6 address on its own. This behavior is different from IPv4 address configuration, in which the IP address is configured either manually or using DHCP. An IPv6 node can be configured through either stateless or stateful auto-configuration. The basic stateless configuration combines a network prefix obtained from the router with the IEEE EUI-64 identifier based on the MAC address. This allows keeping the link between an address and user/host, but the host part of the address can easily be tracked all over the Internet.

Because of user privacy, IPv6 addresses with randomly generated 64-bits interface identifiers are preferred instead of IEEE EUI-64. The RFC 4941 standard defines a way to generate and change temporary addresses. The important requirement is that the sequence of temporarily generated addresses on the interface must be totally unpredictable.

However, this requirement contradicts the need to identify a malevolent user. Private, temporary addresses hinder the unique identification of users/hosts connecting to a service. This affects logging and prevents administrators from effectively tracking which users are accessing what services.

Stateful IPv6 configuration: The DHCP Unique Identifier (DUID) can be used to identify a user in an IPv6 network. However, DUID has several disadvantages. Its value is not easily searchable, since every client stores its values at different places on the local disk. The value is changed whenever the operating system is reinstalled. Experience at BUT shows that using stateful configuration for address assignment is extremely difficult. First of all, even if an IPv6 address is assigned to a host with Windows 7 or Vista using DHCPv6, the host will not use this address for communication but will use a temporary address instead. Secondly, the DHCPv6-client is not supported in Windows XP, which is still widely in use.

Stateless IPv6 configuration: The first part of the IPv6 address – the network prefix – is assigned using RA messages as described in the previous chapter. RA messages do not provide any type of unique identifiers that could be used to identify the host. The second part of the IPv6 address – the interface ID – is generated using EUI-64 or privacy extensions. EUI-64 could be used for host identification since its value is derived from the MAC address. However, Windows 7 and Vista use randomly generated interface ID’s instead, by default. Thus, neither stateful nor stateless configuration provides the unique ID needed for user identification. More information has to be obtained, as discussed in the following section.

3 Proposed architecture

The proposed architecture follows the standard defined in ETSI document TS 102 657 [2]. Although the architecture of data retention system is only briefly described it provides high level overview of the whole system as well as it defines several basic building blocks.

At the top level ETSI defines two communicating entities, Communication Service Provides (CSP) and Authorized Organization (AO). ETSI suggests to establish two communication channels between AO and CSP as a Handover Interface (HI). The first channel (HI-A) delivers administrative request/response information whereas the second channel (HI-B) transfers only Retained Data (RD). Figure 1 displays this model.

Fig. 1. Model of CSP and AO Handover Interface (taken from [2])

The architecture and processes of AO are rather out of the scope of this document. We focus on breaking down the CSP block. Based on ETSI, three functional blocks can be identified within the CSP.

An administrative function (AF) implements both channels of HI and internal interface to acquire retained data from Data Store Management Function (DSMF). The task of AF is then to receive and acknowledge requests for RD, transform and issue these requests in a syntax of DSMF, report on the state of on-going queries and finally deliver the result of the queries as RD over HI-B.

A Data Collection Function collects data from the various internal network elements and prepare the data for retention. This includes both synchronous and asynchronous communication with probes, switches, routers, servers such as DHCP, DNS, RADIUS and user database.

Fig. 2. Break down of CSP function (taken from [2])

At last ETSI defines a data store management function. This process handles indexing and storing the data, executing queries and managing the maximum retention period.
Please note that the ETSI TS 102 657 states that the the decomposition of internal architecture is only informative and may vary according to specifics of the CSP. Moreover, it also allows for outsourcing some functions to a third party depending on the national agreement.

We propose a monitoring system that fits the needs of data retention in IPv6 network. It considers the various sources of collected data, their processing and storage as well as presentation interface. The architecture of the system is depicted in Figure 3.

The Figure displays a mapping of ETSI blocks on implementation primitives that are relevant for IPv4 and IPv6 network. Let’s start with description of DCF. There are three data sources of different type at the input of the system. The first data source constitutes NetFlow generated by probes and routers in the network. The NetFlow data are collected and stored. The second data source constitutes of SNMP which allows to transfer data from switches and routers. SNMP poll reads accessible variables in each device based on its MIB tree and stores this data in the database. The third source of data are event messages and logs of management servers such as RADIUS, DHCP. These data are parsed and extracted information is stored in database as well. The DSMF function joins the data in database with stored NetFlow data. A configuration interface allows to setup and control DCF and DSMF processes. A data interface handles queries on stored data generated by various applications among which belongs an AF function.

We propose to assemble several various tools to implement whole monitoring system according to the proposed architecture. In the following sections, we discuss the specifics of each data retention block and we describe how each block is implemented.

Fig. 3. Proposed architecture of monitoring system

3.1 Data Collection Function

A Data Collection Function collects data from the various internal network elements and prepare the data for retention. ETSI breaks down Collected data into following categories (for each category we list some data examples and their possible source within the scope of IP network).

Subscriber data: information relating to a subscription to a particular service (Data: Name, Address, Identifier; Source: User Database).
Usage data Information relating to usage of a particular service (Data: Flow Records, SIP records; Source: NetFlow, IPFIX, SIP proxy, …).
Equipment data: information relating to an end-user device or handset (Data: MAC address, OS; Source: Neighbor Cache/Forwarding table in switch, DHCP server, RADIUS server).
Network element data: information relating to a component in the underlying network infrastructure (Data: location and identifier of an access point, statistics from interfaces; Data: Network elements via SNMP).
Additional service usage: information relating to additional services used (Data: SMTP, IMAP; Source: Application servers).

DCF must be able to collect data via various interfaces and protocols such as syslog, SNMP, NetFlow/IPFIX, file logs and database interface. Some elements work asynchronously, i.e., transmit data on its own upon an event such as value exceeding a threshold, timer expiration and others. Some elements work synchronously, that is, they must be actively queried for data. The output of DCF is data ready for retention.

The variability of communication as well as the variability of devices manufactured by various vendors renders the DCF very complex. We propose to adopt and customize existing solutions for network administration and monitoring which already cover this complexity.

From a wide variety of network tools Network Administration Visualized (NAV) [4] suite (a collection of libraries put together with output to database and handy GUI) seems to be good solution for collecting data from network elements. NAV has been developed since 1999 and nowadays it is maintained by UNINETT. NAV is freely distributed under the GNU GPLv2 license and supports both IPv4 and IPv6 deployment. NAV is able to poll or receive SNMP messages and events from over 80 network devices (most commonly switches and routers) from 11 different vendors. Moreover, it is able to store logs and messages from Radius server as well.

Fig. 4. Part of NAV architecture

Figure 4 displays subset of the whole NAV architecture. It consists of several backend processes that runs as either daemons or cron jobs. These processes fills up the NAV database (PostgreSQL). Although NAV consists of many processes this report focuses only on those that are relevant for data retention.

The crucial part for DR is to collect information about an association of a user, user’s IP address, MAC address and switch port the user is connected to. This allows to answer typical queries such as who is behind given IP address, what is the user’s location, what addresses does the user use. SNMP is used to obtain data from switches every fifteen minutes. The mapping between the IPv6 address and its corresponding MAC address is downloaded from the router’s neighbor cache. Port, VLAN number and other information comes from the switch’s FDB (Forwarding Database) table.

First of all, the administrator must seed the NAV database with IP addresses of monitored network elements. The ipdevpoll polls each device for inventory information (includes interfaces, serial numbers, modules, VLANs and prefixes), for load information and for logging information such as ARP tables or Neighbor Discovery cache. The obtained information is regularly stored in the database. NAV defines following ARP table schema:

The mactrace process collects mac addresses, port, vlan and other information from Forwarding Database table for all switches and stores the information into CAM table. Its schema is described in Table 3.1. The process also checks for spanning tree blocked ports.

arpid	primary key
netboxid	router the arp entry comes from
sysname	the same router in name (in case the router is deleted, arp has historic data)
prefixid	prefix the arp entry belongs to
ip	ip address of the arp entry
mac	mac address of the arp entry
start _time	time the arp entry was first discovered
end_time	time the arp entry disappeared (typically 4 hours after the last packet sent)

Table 1:Schema of ARP table in NAV DB (taken from NAV doc [4])

camid	primary key
netboxid	switch that has the cam entry
sysname	name of the same switch, in case switch is deleted, cam data are historic
ifindex	infmdex of the switch port for the cam entry
module	module number for the cam entry
port	port number for the cam entry
mac	mac address found on the port
start _time	time the mac address was first seen
end_time	time the mac address disappeared (idle timer in bridge tables are typically 5 minutes)
misscnt	count how many times the cam entry has been tried updated and failed. We do not want to terminate these cam entries right away. It is configurable how many misscnt the camlogger should tolerate.

Table 2. Schema of CAM table in NAV DB (taken from NAV doc [4])

NAV also supports collection of RADIUS data from a FreeRadius server. The FreeRadius server can be configured to push data about authentication requests directly into NAV database. In addition to this interface, there is a NAV radius process which regularly parses the FreeRadius log file. It extracts important messages and pushes them in the database as well. The most important columns of NAV table are described in table 3.1.

We have added a support of another Radius server Radiator [?]. Radiator provides event-hooks which we utilize to parse on-going authentication requests.

username	username used as login to RADIUS
radacctid	Radius accounting ID
acctsessionid	Accounting Session ID
acctuniqueid	Accounting Unique ID
nasipaddress	IP address of an access point
acctterminatecause	Accounting Terminate Cause
acctstarttime	Start timestamp of accounting (time of log in)
acctstoptime	Stop timestamp of accounting (time of log off)
calledstationid	ID of called station
callingstationid	ID of calling station

Table 3. Schema of RADIUS table in NAV DB

Parsed requests are temporarily stored in a file. Another process picks up these stored requests, transforms them into a schema of NAV radius table in NAV database and performs an insert operation. The separation of Radiator and parsing process allows to store messages when the NAV database is temporarily unreachable. It also allows to select and store only those items which fit the current Radius table schema. Probably the most important fields are username and callingstationid. In case of IP network we store a MAC address of a user as a callingstationid.

timestamp           username          callingstationid  result
---------------------------------------------------------------
2011-11-21 12:58:46 xsejno00@vutbr.cz 00:Id:e0:0a:d0:7d	accept
2011-11-23 09:21:49 hazmuk@vutbr.cz   00:21:5c:81:61:91	accept
2011-11-23 07:32:11 xkisli01@vutbr.cz 00:13:02:51:eO:fd accept
2011-11-22 21:31:58 2763@vutbr.cz     00:26:82:lb:cO:cd accept
2011-11-22 16:02:30 xkisli01@vutbr.cz 00:13:02:51:eO:fd reject
2011-11-21 20:45:03 tpoder@vutbr.cz   58:If:aa:82:39:6c accept
2011-11-21 20:23:54 xgregr01@fit.vutb 00:26:82:e5:6b:Of accept

We have further extended the schema of radius table with unique user-ID. A database trigger is implemented to check if a username in the incoming message already exists in the table. If so, then its unique user-ID is used if not then a new ID is generated for this user. The unique user-ID enables to join collected information with other external data such as NetFlow records (this will be discussed in further sections).

The collected ARP entries (MAC-IP pair) and CAM entries (MAC-port pair) and Radius entries (username-MAC or username-IP address) enable to keep relation among users, IP addresses, MAC addresses and switch ports. This is especially important in IPv6 deployment where a user typically generates its own IP addresses and changes them regularly. Moreover, all records contain start time and end time of their validity. This is important for handling history-oriented queries which are typical for data retention system.

NAV does not collect any information regarding the network traffic itself. Therefore it must be complemented with other tool capable of collecting these data (as depicted on Figure 5). Based on our previous experience, we select to gather traffic metadata via NetFlow. Collecting NetFlow records is typically supported by routers or dedicated probes.

Fig. 5. The Central Monitoring System for IPv4 and IPv6 at BUT

NetFlow protocol transfers these records about passing IP flows from probes to a collector. The NetFlow records contain flow ID and statistics. The flow ID consists of source and destination IP addresses, ports and protocol. The statistics include packet count, byte count, timestamps of first and last packets and others.

The large amount of traffic on current networks turns into a large number of NetFlow records transferred to the collector. Approximate volumes of received NetFlow data are around 300 MB per hour for a loaded 100 Mbps network and 600 MB per hour for a 1 Gbps moderately utilized network. But it always depends on the specific composition and type of traffic. In any case, the crucial part of the collector is its storage and in particular, its capabilities such as capacity, write and read speeds. Since most of the collectors run on the similar hardware (commodity PC with extended storage capacity) the capabilities varies by utilized underlying storage format – either database or raw files.

A database server provides a comfort of SQL interface and automatic data indexing. But this comes with a prize. Database systems have not been optimized for NetFlow data processing. The NetFlow data require fast storage, fast retrieval of large amount of data but no update of the data at all. Therefore the indexing structures of the database might not fit these requirements. The collectors using proprietary raw files to store NetFlow data must process the queries at the application level which limits flexibility of the queries. Also there is no indexing involved hence the retrieval of selected data may take longer than in case of DB systems.

Currently, the widely-spread collector called nfcapd is based on storage of records into raw files. The nfdump tool is then used to access and filter these flowrecords. We briefly compared its performance with some well known DB systems such as SQLite, MySQL, CouchDB. The nfcapd outperforms these systems when receiving flow records quite dramatically while the time to answer queries on the stored data do not differ that much.

We are aware of possible DB optimization that might improve the performance of some DB systems. But it would require a significant effort with uncertain outcome. Therefore we select current nfcapd and nfdump tool to serve as NetFlow collector. As mentioned previously, such collectors are less flexible than DB collectors. Therefore some implementation effort is necessary to modify nfdump in order to suit data retention requirements. In particular, to extend NetFlow data with data collected by NAV (IP-MAC, MAC-user assignments).

3.2 Data store management function

The data store management function constitutes of some background processes which handle collected data and provides interface for Administrative function.
As mentioned before, we use nfcapd to capture NetFlow data which are generated by probes in our network, and nfdump tool to process and access this data. The goal is to extend this data with source and destination MAC addresses and source and destination user-IDs. We store the corresponding usernames-user-ID pairs in a separate file for later reference. As a result it is possible track activities of a particular machine or user. This requires to modify nfdump file format and patch nfdump itself to understand the modified format. It also requires a tool that would allow to extract data of interest from NAV DB running in parallel and continuously update stored NetFlow data in nfdump files.

The main issue with adding user-IDs to nfdump files is that there is no defined field for this IDs. There are two possibilities how to solve this issue. Either to store these values into other unused fields that are already available, such as MPLS 1 and MPLS 2 fields, or to change the structure by extending the existing format with new fields. We implemented both options and each has its pros and cons. While the former MPLS solution is backward-compatible with nfsen (GUI for nfdump) since it does not require any patch of nfdump the latter solution does not allow a nfsen to display user-ID unless patched to understand it.

Figure 6 shows the print out of nfdump when MPLS 1 and 2 are used to store user-ID. A typical point of NetFlow observation is an Internet gateway. Therefore it is possible to observe only communication of a local user and a host which is outside of the administered network. As a result one of the MPLS tags is zero which means no association with user identity.

Fig. 6. User-ID stored in MPLS labels (nfdump print out)

It is important to note that all the identification data are not available at the time of arrival of NetFlow data. For instance, NetFlow data are available when they are sent to the collector. However, there is no information available in the NAV database about IP, MAC and user yet. Such information is downloaded later from the switch’s ND cache. The same holds for Radius data but RADIUS data are not available for every user – only for those who are connected using 802.lx authentication. For other users, only the IPv6 address and the switch port number and MAC address are used for identification.

An additional tool (nftool), a helper script and helper DB are used to extract (MAC, user-IDs) and (MAC, IP) pairs from NAV DB and store them into nfdump files. The architecture is depicted in the Figure 7.

Fig. 7. Framework for fusion of NetFlow and user identification

The helper script is written in Perl and runs every hour (as a cron job) to import data regarding the user-ID, MAC, IP addresses into helper DB (Berkeley DB). This is done to alleviate NAV DB from intensive queries which are caused by nftool when it figures out which MAC address and user-ID belongs to each flow record in the processed nfdump file. The script assembles two helper DB tables (files), user-ID table and MAC table. The user-ID table consists of user-ID, MAC address, start timestamp and end timestamp. The MAC table consists of MAC address, IP address, start timestamp and end timestamp. The timestamps determine the interval during which the assignment is valid.

The nftool is written in C due to a large amount of data and operations it must execute to match nfdump flow records with their corresponding user-IDs and MAC addresses. The nftool runs every hour after the helper script finishes importing data into helper DB. The nftool reads all nfdump files stored in previous hour and parses record by record. It tries to associate each record with its source and destination MAC address based on a match of IP addresses in MAC table. Based on the associated MAC address, the nftool tries to assign a corresponding user-ID from User table.

The deployment in a production network revealed several issues. The associated start timestamps in MAC table may be up to 15 minutes late in relation to the real appearance of the assignment in the network. This delay is caused by NAV setup which by default scans neighbor caches every fifteen minutes. The address may appear right after the scan of the cache and stays unregistered till the next scan. Therefore the nftool considers a each MAC record valid 15 minutes prior to its timestamp. There are also records in which the end timestamps are missing. This might be due to records that has not yet expired from the neighbor cache or due to events such as switch reboot, etc. There might also be situations when one IP address matches multiple MAC records. It is up to nftool to handle these corner cases correctly, e.g., to match the most probable MAC record based on timestamps.

The time dependency of the gathering of different data is crucial when accessing the ND Cache. This temporary memory at the router stores information needed to build the link between the IPv6 address and the MAC address. Because IPv6 addresses change in time and have limited validity, if the ND entry is lost, there is no way to link the IPv6 address and the user/host. To ensure that all information is stored properly in the monitoring system, the SNMP polling interval has to be shorter than the expiration timeout of the ND Cache. Otherwise, some entries in the ND Cache could expire without being downloaded into the central system. Typical timeouts for collecting SNMP and RADIUS data are fifteen minutes. The ND Cache expiration timeout is usually set to more than one hour.

3.3 Administrative function

The main task of Administrative function is to implement handover interface of retained data (RDHI). The ETSI specifies a model for RDHI. It constitutes of several layers, each providing specific functionality.

The message flow layer deals with communication establishment and control. It defines two operational modes: General and Authorized-Organization-initiated. In the case of General mode, both entities are able of full-two way transport of messages whereas in the case of Authorized-Organization-initiated situation, AO must query CSP every time AO wants to receive any data. Next layer specifies contents of each message that is what information must be included in transferred data (identifiers, timestamps, etc.). The encoding and delivery layer defines techniques to handle transparent data exchange. It defines several options such as Direct TCP data exchange, or exchange over HTTP.

In order to hand over retained data collected by NAV and nfdump, we choose to implement a single HTTP client/server communication operated in a Authorized-Organization-initiated mode, i.e., CSP runs an HTTP server and web interface serves to answer AO requests. The administration function is written in PHP and is interpreted by the server. The script receives a request via a POST parameters filled out in the web form. The query typically contains constraints on time interval, IP address, username or MAC in question. Based on the constraints, a nfdump query is constructed and executed. The processing of retained data may take a while since nfdump must filter out records that do not match the constraints. When the result is returned the script assembles a web page which is sent to an AO. We recommend to use HTTPS in order to assure secure delivery as well as authenticity.

Fig. 8. RDHI model (taken from [2])

Please note, that the above described handover interface does not strictly conform to ETSI standard yet. The message flow such as request-acknowledge must be embedded into application level of communication as it is not sufficient to return HTTP status codes as control messages. Further, multiple parallel requests generated in a single session must be addressed.

4 Practical experience

4.1 Network @ Brno University of Technology

This chapter describes how the issues of IPv4 and IPv6 monitoring discussed above are solved at the BUT campus network. The BUT campus network includes 134 active routing devices on the backbone and thousands of connected users (especially students). The chapter presents the data and data sources required for monitoring and how they are obtained. Some results and statistics about IPv4 and IPv6 traffic are given at the end of this chapter.

The campus network at Brno University of Technology (BUT) was built up as a result of cooperation among other universities placed in Brno and Czech Academy. The campus network connects together several institutions (University faculties, research Institutions, Czech Academy, high schools) placed on over 20 locations in different parts of Brno. Each location is connected at least with two optical cables from two independent directions to achieve maximum reliability of the network. The total length of the optical cables is over 100 km. The core architecture is depicted on Figure 9.

The core of the network is based on 10 Gbps Ethernet technology using HP ProCurve and Extreme Networks devices. OSPF and OSPFv3 routing protocols are used as the interior routing protocol. External connection to the National Research and Education Network (NREN) that is run by CESNET is provided over two 10 Gbps lines with BGP and BGP+ routing. The topology of the core of the network is shown in the Figure 9. From the user perspective, the BUT university campus network connects more than 2,500 staff users and more than 23,000 students. The top utilization is at student dormitories where more than 6,000 students are connected via 100 Mb/s and 1 Gbps links. The IPv6 campus connectivity is implemented according to the Internet Transition Plan [1]. Most of the parts of the university already provide native IPv6 connectivity and significant part of devices connected to the campus network can fully use IPv6.

4.2 Practical Configuration of IPv6 at BUT

At the university environment, the best practice is to identify hosts based on the hosts’ IPv4 addresses. Usually, it is done by central system for user registration with the users’ MAC addresses. The user registers his MAC address in the system. The MAC address is then used in the DHCP configuration to assign a corresponding IPv4 address. Registered MAC addresses, together with system logs of DHCPv4 servers and data from RADIUS servers, are sufficient to uniquely identify the user, based on the IPv4 address. For a long-term history, NetFlow data are gathered using special NetFlow probes, working on 10 Gbps links. DHCP logs, RADIUS logs and NetFlow records are stored at the central monitoring system, where the users’ activity can be looked up as required by the Data Retention Act.

User monitoring of IPv6 traffic is more complicated. The IPv6 address is no longer a unique identifier, as in the case of an IPv4 address. This is mainly because of temporary addresses, as described above. There are two ways to assign IPv6 addresses. Practical experience at BUT indicates that stateful configuration using DHCPv6 does not work properly, so only stateless configuration can be deployed.

One of the basic problems is address assignment to the client systems. The mixture of various OSs requires a solution of automatic address assignment that is supported by most systems. The stateful autoconfiguration using DHCPv6 is very difficult to use today because the lack of support in Windows XP which is still very widespread OS, and older version of MAC OS. DHCPv6 does not support all configuration options (e.g. option for default route), so the stateless autoconfiguration (SLAAC) [3] has to be used as well. Unfortunately, the stateless autoconfiguration in some operating systems turns on privacy extensions.

Fig. 9. Topology of VUT network

As a result, the devices use a random end user identifier (EUI) named Temporary IPv6 Addresses. This is a brand new IPv6 feature that allows a node to automatically generate a random IPv6 address on its own. However, this feature contradicts the need to identify a malevolent user. Private, temporary addresses hinder the unique identification of users/hosts connecting to a service. This affects logging and prevents administrators from effectively tracking which users are accessing IPv6 services. Many internal resources require the ability to track the end user’s use of services.

IPv6 auto configuration options also increase complexity. There are two fundamentally different mechanisms and protocols where one cannot fully work without the other. Configuration of Recursive DNS servers is nowadays not possible using SLAAC and with DHCPv6 it is not possible to configure the default gateway address (default route). As a result, the only working method is to use both protocols simultaneously. Failure of either mechanism whether through faulty configuration, bugged software or targeted attack, leads to denial of IPv6 connection to the user. Moreover diagnostics are fairly complicated and it requires good knowledge of both mechanisms.

There are two scenarios used for assigning IPv6 addresses. Both Stateless IPv6 configuration and Stateful IPv6 configuration is used.

4.3 Installed DR components

In order to feed our data retention system with information about IP and MAC addresses of connected users we collect ARP tables and FDB tables from all routing switches located in different buildings of BUT campus. Each of these switches serves as a gateway for a given building. We setup NAV to poll these switches regularly every 15 minutes and if a change occurs it is logged in the NAV database. The NAV system runs on a dedicated machine.

The basic NetFlow data are obtained from three monitoring probes that had been installed in the different part of the campus network. Two of them on the upstream lines to collect complete data exchanged between the campus network and rest of the Internet. The third probe was installed at the student’s dormitory where most traffic of the network is concentrated. In cooperation with the company INVEA-TECH the probes were ported to HP Procurve ONE service module. That allowed to process data directly from the backplain on the switch. Data obtained from the probes are collected on the single NetFlow collector. The collector pulls out data from NAV machine and merge these data with NetFlow.

4.4 Obtained results

Many interesting statistics were obtained as the side result of implementation of data retention system. The Figure 10 displays visibility of a single machine under various IP addresses. The first IP address starting with fe80 is link local address which remains constant for the whole period of observation. The same holds for IPv4 address. The machine has multiple self-generated IPv6 addresses which are used in ad-hoc mode.

Fig. 10. Visibility of a computer under various IPv6 addresses

Since the deployment of the system we have been able to observe differences between IPv4 and IPv6. In total, we have observed 41032 unique MAC addresses. There have been 18480 MAC addresses which have been visible under any IPv6 address. Nearly all these MAC addresses (except 100) have been associated with link local IPv6 address. There have been 26277 unique IPv6 link local addresses. But more importantly there have been 13733 MAC addresses with nearly a million of global IPv6 addresses. This means that on average each IPv6 capable machine changed its global IPv6 address more than 60 times. On the other hand, we have seen only 43786 unique IPv4 addresses in total. The observation period has been approximately 10 months.

We have also observed evolution of the traffic during a shorter period (a week, gray background marks weekends) with timescale resolution of one hour. Some of the findings are plotted in Figures 11, 12, 13.

The Figure 11 shows the number of hosts with assigned IPv4 or IPv6 address. We consider a host to be uniquely identified by the MAC address. We can see that the number of hosts follows the daily pattern with a significant decrease during Friday till Sunday afternoon when students return to dormitories. The amount of IPv6 hosts is close to the number of IPv4 hosts. In comparison to the total statistics presented above the ratio of IPv6 and IPv4 host has increased significantly. This increase is most likely caused by migration of users to a newer operation systems during this year.

We have also focused on the IPv6-capable hosts that actively utilize IPv6 during communication. The graph on Figure 12 displays the number of internal and external hosts involved in active communication. The term internal host means a host which belongs to the BUT network whereas external host is located outside of the BUT network. We can observe that the number of internal IPv6 hosts is smaller than the number of external IPv6 hosts, i.e., an internal host communicates on average with more than two external IPv6 hosts.

Fig. 11. Number of clients using IPv4 or IPv6

Fig. 12. Number of internal and external IPv6 hosts

Finally, the graph on Figure 13 displays the amount of traffic with respect to the IP protocol used. We introduce a third category which account for the tunneled traffic such as Teredo. The amount of traffic strongly follows the daily and weekly period. The amount of ingress traffic is significantly larger than the amount of egress traffic for both IP protocols. On average, the amount of IPv4 traffic is ten times higher than IPv6 which is ten times higher than the amount of traffic utilizing tunneling mechanisms. The large difference between the amount of IPv4 traffic and IPv6 traffic is in contrast with the small difference of IPv4 and IPv6 capable hosts. Nevertheless this discrepancy is expected as a result of small support of IPv6 by network applications.

Please bare in mind that the presented statistics are valid for a campus network which might be specific due to its users and strong effort to keep up with the IPv6 transition plan. A commercial provider might observe a different statistics. We would expect to see even a larger difference in the amount of IPv4 and IPv6 traffic. The main cause could be older OS of users and missing native support of IPv6. In such a case, tunneling mechanisms come into play and there might be tunneled IPv6 or IPv4 traffic only.

Fig. 13. Breakdown of the traffic mix based on IP protocol

5 Conclusions

This technical report was focused on the design and implementation of data retention system in IP network. The stress was given on addressing issues related to the deployment of IPv6 in terms of recovering user identity.
The designed system consists of several monitoring tools. The results of these tools are combined together to obtain data about the past and on-going traffic enriched with the information about a user identity. The system has been successfully deployed in BUT network. Since its deployment it has been used to manage violations of network usage policy and to observe IPv6 network behavior and its trends.

References

Curran, J.: RFC 5211 An Internet Transition Plan. 07 2008. URL http://tools.ietf.org/html/rfc5211
European Telecommunications Standards Institute: ETSI TS 102 657: Lawful Interception (LI);Retained data handling;Handover interface for the request and delivery of retained data. 12 2009, version 1.4.1.
Thomson, S.; et al.: RFC 4862 IPv6 Stateless Address Autoconfiguration. 09 2007. URL http://tools.ietf.org/html/rfc4862
UNINETT and Norwegian University of Science and Technology: NAV. 3 2011, version 3.9.
URL http://metanav.uninett.no

Figure 12

About the Author

Martin Zadnik

Network Monitoring Based on IP Data Flows

Martin Zadnik — Sat, 13 Mar 2010 09:15:34 +0000

1 Approaches Used for Network Monitoring

Monitoring of present-day networks can be divided into two basic groups. The first one is based on inspection of packet contents. The contents of the packet are compared to a fairly large database of known samples (regular expressions), and if a match is found a relevant action takes place, for example, communication from the computer that sent the packet is blocked. Most contemporary IDS (Intrusion Detection Systems) are based on this principle.

The second group concerns collection and analysis of statistics describing network behaviour. Statistics are gathered with various level of detail, depending on what information we are willing to omit. Basic information is obtained by monitoring the status of key network components. For instance, by monitoring values of SNMP counters at network interfaces. The collected data are very approximate because the counters aggregate information about all traffic. Another option is to use the RMON (Remote Monitoring) architecture. RMON agents are able to carry out several actions, such as collecting statistics about interfaces (network load, CRC errors etc.), creating history out of selected statistics, specifying alarms for statistics thresholds being exceeded and generating events (sending alerts). Some agents allow setting up statistics monitoring for several selected users.

This is insufficient from the perspective of present-day networks. Nowadays companies build their infrastructure (stock exchange and payment transactions, IP telephony, e-commerce) on reliable and secure networks, and hence advanced technologies must be used to get detailed statistics.

The NetFlow technology introduced by Cisco in the late 1990s is among the most popular technologies today. The popularity of NetFlow is due to its convenient level of abstraction. The whole traffic mix is divided into flows based on a quintuple of key data, and besides the quintuple (source and destination IP address, source and destination port, protocol number) other statistics are also monitored for each flow, such as the number of packets and bytes, the time of flow start and end, set TCP flags and more. The collected data can help you identify and locate network incidents, show network load, tune QoS settings etc. The NetFlow data can be further aggregated and thus various views of network traffic may be created and important information about applications and users may be obtained, and these can be used later for strategic planning of company development or to verify conformance to network usage policy.

Examples include observing limits for the amount of incoming/outgoing traffic from/to a local network, where the IP-address-based aggregation and sorting according to the number of bytes will display the statistics of the users that violate their limits most heavily.

2 NetFlow Architecture

NetFlow architecture is based on two types of components. The first type are components (let us call them agents) able to collect statistics (records about flows) and send them through the NetFlow protocol. The other type of component is a collector that receives statistics measurements and saves them for further analysis. Agents can be implemented in routers or autonomous probes placed at important network locations. For example, at the gateway of a local network to the Internet, at campus network nodes or data centres etc. A collector can be placed anywhere on the network and collect information from several exporters at the same time. The network administrator accesses the data through a web interface or a terminal.

Picture 1: Architecture to measure flows based on the NetFlow protocol

3 NetFlow Agent

The agent runs two important processes, a measuring one and an exporting one. The measuring process consists of several subtasks. The most basic subtask is receiving the packet, assigning a unique timestamp to it and extracting important items from the packet header. Then the relevant flow record must be found in the memory. Records are usually organised into a field of lists and the address of the relevant list is obtained by calculating the hash value out of the five key flow items (addresses, ports, protocol). The correct record is found sequentially in the list. If a specific record does not exist, it is the first packet of the flow and a new record is therefore created. The next packets of the given flow will contribute to the statistics of the same record. When the flow ends, the record must be released from the memory so that it does not needlessly occupy space for newly created records. This is not so simple. For TCP connections, the end of communication can be detected in packets by looking for the FIN and RESET flags, but if such a packet goes through a different route or gets lost, the record will be in the memory for quite a long time. Furthermore, such flags do not exist for other protocols (UDP, ICMP and others). Besides the flags mentioned above, heuristics is also used; heuristics detect the end of flow if no packet came to a given record for a long time. If the agent is saturated with new flows, the records are also released. The record can be handed to the exporting process after it was released. The exporting process will process the records and create NetFlow protocol packets out of them.

3.1 Agent Parameters and Configuration

We can tailor the measuring and exporting processes to the network and administrator needs. For instance, if the agent runs on a router the parameters must be set to let the router carry out its main function, i.e. routing.

Sampling is the first parameter that has an impact on the performance of the agent. Sampling determines the probability of the passing packet being the subject of monitoring. Decreasing the probability lowers the load of the measuring process, but unfortunately the quality of the exported statistics is also lowered. When the probability is too low (usually less than 1:10), some characteristics that can be deduced from unsampled data disappear (e.g., the original number of flows). Some NetFlow agents explicitly require sampling with a low probability; therefore it is advisable to learn about the agent capabilities before you start monitoring.

Inactive timeout is another parameter that influences performance; specifically it influences the size of the allocated memory. Inactive timeout is an interval that is used as a heuristic to release records from memory (if the record was not updated during this interval it is released). Too low inactive timeout causes a premature release of the record. In that case several records are created for a flow, which is similar to the broken spaghetti effect (lots of short flows). Increasing the timeout interval will improve the situation, but the allocated memory space will increase. Optimal timeout for present-day networks is 10 to 30 seconds.

So far we described how to release the record after the flow ends. But what if the flow lasts too long? Imagine a user with a slow Internet connection downloading the latest Linux distribution. To let the administrator learn about such events, you must introduce so-called active timeout, and if the flow takes more than that interval then it is released from memory and reported.

For the export process it is necessary to set the destination of the measured statistics, i.e. the collector’s IP address and port, or more collectors if the exporter supports that. For some exporters the sampling of outgoing NetFlow data can be configured to prevent the collector being overloaded.

The agent configuration is not provided by the NetFlow protocol, but rather by proprietary methods using a terminal or web interface. For example, with Cisco routers you can use the following commands to launch and configure NetFlow (2800, IOS 12.4):

Global NetFlow configuration at the router (definition of destination collector and export protocol version)

Router(config)#ip flow-export destination 192.168.0.100 60001
Router(config)#ip flow-export version 5

Configuration of monitored networks

Router(config-if)#ip flow egress

Checking flow memory and export

Router#show ip cache flow
Router#show ip flow export

4 Collector

Just like an agent, a collector runs several processes. The basic process is saving data received from NetFlow agents in a defined format to specified storage. Depending on the type of collector you can encounter other processes such as the presentation process displaying saved data (usually through a web interface) or a process analysing received data (development trends, calculations of long-term statistics, discovering network anomalies). Apart from these processes the administrator can access the stored NetFlow data at any time and collect information he is currently interested in.

The saving process may be implemented differently for different collectors. The two most important formats for saving NetFlow data are saving to standard databases (for example MySQL), where database server services are used and the subsequent analysis is run via SQL queries, or saving directly into binary files in a specific format depending on the collector.

The advantage of database collectors is easy query processing for saved data, because the database system does most of the work. Conversely, for binary collectors the queries are implemented in the application and the creator of the collector decides what query types to support. Poor performance while accepting NetFlow data, i.e. inserting them into the database, is a clear-cut disadvantage of database-based collectors.

Approximate volumes of saved NetFlow data are around 300 MB per hour for a loaded 100-Mb/s network and 600 MB per hour for a 1 Gb/s moderately utilized network, but it always depends on the specific composition and type of traffic. Most collectors therefore provide advanced tools for long-term administration of saved NetFlow data, such as automatically replacing the oldest data with new data if a predetermined level of data storage allocation is reached or decreasing granularity (level of detail) of older data while maintaining all the details for the newest flows. This also reflects the way NetFlow data are used, i.e., incidents are dealt with immediately or within a few days at most, while older are used for top-N statistics and trend monitoring.

The presentation and analysis processes always depend on a specific collector implementation. Usually there is a graphic interface accessible through a web interface, with optional display of graphs from received data on various time scales, filtering the defined traffic type only, displaying waveforms according to the amount of transferred data, number of flows or packets, summary statistics for a selected period, list of the largest data transfers and IP addresses with the highest load, trend estimates etc.

4.1 Collector Parameters and Configuration

Collectors can be differentiated according to the amount and intensity of the incoming NetFlow data or according to the subsequent usage of stored data.

Sampling of incoming NetFlow data is the first critical parameter. A surplus of performance capacity should be maintained, especially for heavily loaded collectors. For database collectors almost always a longer sampling interval must be set to prevent unexpected packet loss. This is of course also true for collectors which use their own method to save data to disk storage. But these are less sensitive to the amount of NetFlow data and they are usually able to process all data without sampling.

Time intervals according to which the data are saved are another parameter. We commonly meet one- or five minutes’ intervals, but other intervals can be defined as well.

Most collectors let you specify NetFlow data sources and ports where NetFlow data can be accepted. These countermeasures make it harder for an adversary to pollute your NetFlow storage with forged data, which is relatively easy because the UDP transport protocol is used.

5 NetFlow Protocol

The most widespread protocol today is NetFlow v5, which is the de-facto standard to transmit flow data. Thanks to its simple structure it is widely supported by both exporters and collectors. The NetFlow v5 record format contains only the source and destination IPv4 address, source and destination port, protocol number, start and end timestamp, the number of transferred packets and bytes, TCP flags, the ToS/DiffServ field and the number of network interfaces at which the flow was measured.

Nowadays this format appears to be restrictive, and a flexible record format defined by the NetFlow v9 protocol is being introduced. Introduction of templates is the most important difference between v5 and v9. Templates are used to define your own record structure and record items. The user will thus define which values she wants to export and how much space she wishes to assign to them. The collector will process these templates and interpret the incoming records based on them. Moreover, the exporter can send much more information about the agent itself (number of received packets, discarded packets, number of sent flows etc.) when using NetFlow v9. Unfortunately, neither NetFlow v5 nor NetFlow v9 supports secure data transfer from agent to collector, because they are assumed to be placed on the same private network. This is not always possible and then records are transmitted over the Internet. In such cases the protocol is susceptible to eavesdropping and submission of forged records. Moreover, the records can get lost because the UDP transport protocol is used; for NetFlow v9 SCTP (Stream Control Transmission Protocol) can also be used.

The NetFlow protocols were developed by Cisco and as such are a proprietary solution (NetFlow v9 is described in the informational RFC 3954). The IETF (Internet Engineering Task Force) therefore started to work on a more general, broader definition of a protocol to transfer flow records called IPFIX (Internet Protocol for Flow Information Export). This definition is based upon NetFlow v9, from which it adopts the use of templates but it defines possible record items more precisely and defines more measurable values. Unlike NetFlow, IPFIX requires PR-SCTP (RFC 3785) to transport data, which is a reliable protocol and it prevents congestion. The definition has been published in several RFC documents (RFC 3917, 5101, 5472, and others). IPFIX is expected to replace NetFlow and presently there are several prototype implementations of exporters and collectors which are able to work with the IPFIX protocol.

6 NetFlow Data Analysis

NetFlow data are received at the collector side where they are saved to disk and subsequently analysed. Either the analysis runs automatically or the user runs the analysis with queries. This creates various views on the network traffic and important data are obtained, for instance, daily traffic distribution, information about users and many more.

An example shows how to discover users who violate the limit for the amount of outgoing traffic from the local network to the Internet. The process is shown on the publicly available NfSen collector.

First we select a time interval of interest in the chart (picture 2).

Picture 2: Traffic Amount vs Time Chart
We will select aggregation of NetFlow records according to source IP addresses and sorting according to number of bytes, and for brevity’s sake we will be interested in the first five only.
Listing (pic. 3) will display a table of the users who violate their limit most heavily.
If we want to learn whom those users communicated with, we can list all records containing the user’s source IP address and sort these records again according to the amount of transferred data.

Picture 3: Top-10 users with the largest outgoing traffic

7 Usage

If you recall the NetFlow v5 record definition it is clear that it is possible to discover who talks to whom, for how long, and what application he uses. And many other kinds of information can be gathered with the advent of the NetFlow v9 and IPFIX protocols.

Measuring network traffic based on flows has many practical applications which contribute to network reliability and security. NetFlow implementations and use differ according to traffic and network characteristics. The most popular applications of NetFlow include the following areas.

Observing limits and security policies. NetFlow data can be used to monitor how the users comply with network usage policy. For example, whether the limits for incoming and outgoing traffic per user are exceeded. Moreover, with the help of NetFlow data we can display a spectrum of network traffic, i.e., distribution of data between services, and we can focus on the most used services.
Locating illegally installed servers on the network is an example of complying with security policies. Imagine a common user who brings her laptop to work and connects it to the company network. An FTP server is running on this laptop, and Internet users connect to this server and download data, which increases the amount of outgoing traffic and decreases the available capacity for legitimate traffic. Such a server can be exposed by pairing NetFlow data about incoming and outgoing flow and comparing which flow happened first. If the incoming flow initiated the communication then we just revealed a server.
Another group of undesirable applications are P2P networks. Locating them on the network is advisable for several reasons: the data sent and parties communicating over these networks are not trustworthy, illegal data are shared (BitTorrent), applications are disturbing people (ICQ, Messenger, …). Analysis of NetFlow data makes it possible to reveal such applications by observing known ports, for instance, BitTorrent traditionally opens several connections on ports 6881-6889. However, most modern P2P applications are not permanently bound to specific ports, and if standard ports are blocked they scan ports and try to connect to the central server using an open port. Such applications can be detected by a specific IP address (address prefix) of the central server to which the application normally connects to log in to the network. Unfortunately, once such applications become blocked on the firewall, they start to mask their activity in various ways and their detection becomes hard. A blocked Skype can create a HTTP or HTTPS connection to a secret proxy and through such a proxy it can get to the central server.
Detection of attacks and suspect activities is a very broad subject, and this report therefore covers only some examples for which it describes how they appear in NetFlow data and how to find them in NetFlow data. A search for an incident can be carried out directly on flows coming from the attacker or on data which are the reaction of attacked computers to the attack.
- Most attacks are preceded by scans of IP addresses and ports. The attacker is in this way looking for ports on which applications listen so that she can exploit their vulnerability.There are two types of scans: vertical and horizontal. Vertical scan means scanning ports of a single computer. The list of open ports gives the attacker information about the type and version of the operating system and the possibility to exploit a known vulnerability. Conversely, with a horizontal scan the attacker selected a particular application (port) and she tries to discover which computers are using this application.
  Both types of scans are revealed by an increased number of flows (if the scan is intensive enough). The administrator can then focus on the relevant section and search via aggregation which target user accounts for the largest increase in flows, and then find out who the scanner is (vertical scan discovery). Aggregation by ports can discover an unusual increase in flows for a specific port and locate a scanning user (horizontal scan discovery). Scanning flows will contain only a small number of packets (usually one packet).
  
  When analysing the reaction of attacked computers we should focus on any unusual increase in TCP RESET packets which a computer generates in response to a TCP SYN packet on a blocked port (vertical scan). To detect a horizontal scan we monitor increased numbers of ICMP Host Unreachable. UDP traffic can be analysed in a similar manner.
- Detection of Denial-of-Service (DoS) attacks. The attacker tries to deny legitimate users access by using server resources or even network resources.A well-known attack is TCP SYN-flood, where the attacker keeps opening new TCP connections, which depletes the victim’s resources. The victim may try to close the connection by sending a TCP packet with an RST flag set.
  This can be found in NetFlow data by looking for a large number of TCP flows containing a single packet, or in a roundabout way by monitoring an increased number of RST flags in the opposite direction of communication. Trying to identify the attacker is useless, because the source addresses are usually forged.
  
  DoS attacks are nowadays usually run from so-called botnets, networks of computers belonging to normal users that were attacked by the attacker and to which the attacker gained unauthorised access. Such computers generate legitimate requests for services (for instance, a web page request), but the number of requests saturates the server. Such attacks show up as a flash-crowd
  effect, where many users try to connect to the server, for instance, to download the newest music video. In both cases, the administrator should be informed.
- Flow monitoring can expose the spread of worms. Worms, unlike viruses, spread using the file system or network. The infected computer opens new connections to other computers and the worm tries to exploit the application vulnerability to spread. In order to identify compromised computers, look for large numbers of unexpected open connections to other computers.
Quality of Service (QoS) monitoring is another field where you can use flow monitoring. Unlike active QoS measurement where special packets are inserted into the network traffic this is a passive measurement. QoS parameters are thus measured on real traffic.This is advantageous on the one hand because measurement runs on user data and the network load does not increase, but on the other hand the experiments cannot be controlled precisely and the measured data might be biased. Usually, data from multiple agents must be compared for so-called oneway measurement, which requires an exact time synchronisation at all measurement points (using GPS for instance). Delay can thus be measured for all network flows on components and transport lines between agents. This can be used to monitor varying times to process specific traffic, for instance, processing of multicast packets on routers may take longer as their processing is more complicated than that of normal traffic.
A flow is considered one-way in NetFlow; however, the two-way connection features can be measured as well, such as RTT (round-trip-time), but then the relevant flows must be paired – outgoing and incoming.
Flow monitoring can be used for optimisation purposes. Imagine that a company has a SLA(Service Level Agreement) with its Internet Service Provider to guarantee bandwidth for VoIP traffic. Using NetFlow, it is possible to verify the assigned bandwidth and to plan a decrease or increase of bandwidth according to a maximum load.NetFlow statistics can also help balance routing between ISPs and plan peering strategies thanks to the knowledge of source, destination and intermediate nodes that data travel through.
NetFlow records can be useful for accounting of transferred data or services. Since the statistics contain information about communicating parties, time and amount of transferred data, it is possible to charge individual users according to the communication interval, time of day and amount of data transferred.

8 Ethical Perspective of NetFlow

If we compare flow measurement to packet content analysis, we find that NetFlow describes network traffic from the perspective of its behaviour and not from the perspective of the data itself. Thus it is not possible to use NetFlow to block a specific flow that a worm uses to spread, which is generally possible when inspecting packet contents. In NetFlow data, it is possible to see that the attacked computer initiates too many outgoing connections, which is suspect and then we can focus on such a computer and possibly block its communication at the firewall.

Monitoring flows appears to be most acceptable by all parties concerned (ISPs and users) from the ethical point of view. A simple explanation of the difference between NetFlow and packet inspection can be given by making a comparison with the postal services. Looking at packet contents is de-facto opening the envelope and searching through its contents, while flow monitoring is reading and copying the information on the envelope about sender and addressee, which are publicly known anyway.

In this respect, monitoring based on flows will often meeting the requirements of national legislations about collection by network operators of operational and localisation data of electronic communications.

9 Available Solutions

Cisco routers were originally the only components capable of collecting and exporting NetFlow data. Thanks to the popularity of NetFlow, software agents were written for common PCs, and thus autonomous NetFlow probes were created. Real usage has shown that ordinary computers with no hardware support suffer from packet loss during peaks of network traffic or on lines with intense traffic. Hardware-accelerated probes therefore appeared; these consist of a special network card and a computer.

If a Cisco device is on the network, the first option for flow monitoring is to configure IOS to monitor and export NetFlow data. NetFlow can run on Cisco IOS routers series 800 to 7500, and also on Cisco Catalyst 6500 Switch and routers series 7600, 10000, 12000 a CRS-1 devices. The measured statistics are exported to one or more collectors in the NetFlow v5 or v9 format. The most interesting features of Cisco NetFlow include incoming traffic filtering (measurement runs only on a subset of the total traffic), support of monitoring MPLS (Multiprotocol Label Switching) packets and definition of additional items for security analysis. Features of individual agents can differ with different versions of IOS and types of device.

NetFlow data collection consumes the routers’ computing resources according to current network traffic and configured parameters of the monitoring process. Before starting experimentation it is advisable to visit a Cisco page (www.cisco.com/go/netflow, NetFlow Performance Analysis) where you can find details about allocated computational resources for individual devices and types of network traffic.

The amount of allocated resources can be one of the reasons to use autonomous NetFlow probes. The advantage of such a solution is the fact that the router carries out its primary function, i.e. routing, and is not burdened with another task. As a consequence, experimenting with a NetFlow probe has no impact on the network traffic. NetFlow probes will naturally be used on networks with no source of NetFlow data, e.g., not built upon Cisco technology.

The Czech company INVEA-TECH (www.invea-tech.com) offers an interesting solution in this area. Its portfolio includes probes (called FlowMon) capable of measuring networks from 10 Mb/s to 10 Gb/s. Probes create statistics fully compatible with NetFlow v5, v9 and IPFIX and send them to an embedded or external collector. Probes are connected to the network through a mirror port of the router/switch or by direct insertion of an optical or metallic fork (TAP). The FlowMon product series includes standard probe models for ordinary networks and hardware-accelerated models for critical and heavily loaded lines. Exporters of these probes can send data to several collectors and also filter them according to a scope of IP addresses at the same time. An ISP which provides connectivity to several companies can hand over relevant NetFlow data directly to the relevant companies, which can use them for applications as those mentioned above. If NetFlow data need to be presented to third parties they can be made anonymous (IP addresses or ports can be modified) and users can thus be protected from potential misuse.

NetFlow data generated with the FlowMon probe are sent to an integrated or external collector. Any third-party application or a FlowMon monitoring centre that is part of the package can be used as a collector.

Publicly available software NetFlow agents can be downloaded from the Internet and installed on an ordinary computer. It is advisable to carefully optimise such agents, so that no large packet losses occur. The first tunable parameter is cutting the size of the received packet (snap length). The reason is that only the packet header must be processed during monitoring. Normally, capturing the first 96 bytes is sufficient; this saves unnecessary memory allocation and speeds up the monitoring. It is also advisable to limit the maximum number of monitored flows. Such a countermeasure prevents the measuring process exhausting the computational resources of the probe in critical situations (DoS). Additional optimisation requires recompilation of the operating system kernel, so that classic reading of packets from the network card is replaced with constant probing of the network card to check if any packets are available. This removes a system bottleneck caused by an interrupt storm (in a classical system, one interrupt is generated per packet). After these optimisations, the probe can measure even Gigabit lines with normal traffic.

nProbe (http://www.ntop.org/nProbe.html) is one of the popular, commercially available agents. Its distinctive features are an export format of NetFlow v9 or IPFIX, and that it is offered for Unix and Windows systems. Compared to standard NetFlow/IPFIX items it also contains proprietary items which focus on VoIP monitoring (specifically on SIP and RTP protocols).

Publicly available software NetFlow agents include fprobe (http://fprobe.ourceforge.net/), which provides data in NetFlow v5 only (and also in v1 and v7, which are not described in this report).

Once the proper probe is selected we need to focus on a collector. To choose the right collector we need to make sure that both agent and collector can process data in the chosen format: NetFlow or IPFIX. It pays off to be especially careful with the IPFIX protocol. Even though both exporter and collector may support IPFIX, this protocol is still under development and interoperability might be an issue.

An example of a typical publicly available collector with an advanced graphical interface is the NfSen collector (http://nfsen.sourceforge.org). It is able to process NetFlow v5 and v9 protocols or the multipurpose tool ntop (http://www.ntop.org) with a special plug-in to collect NetFlow and IPFIX.

The benefit of commercial solutions is technical support and often sophisticated advanced functions such as automatic generation of detailed exports, detection of network anomalies and attacks. Popular collectors include Cisco NetFlow Analyzer (http://manageengine.adventnet.com/products/netflow/cisco-netflow.html) and Caligare Flow Inspector (http://www.caligare.com), or the FlowMon solution mentioned above.

The FlowMon monitoring centre is accessible through a secure web interface and it offers many options such as displaying network statistics as charts and tables with various time scales, generation of top-N statistics, filtering of data according to required criteria, creating user profiles, running security analyses, or setting the generation of automatic alerts to required events such as breaches of security policies. By using expansion modules these functions can be further extended to include SNMP monitoring or automatic anomaly detection.

10 Future Outlook

The development of applications based on flow monitoring keeps advancing. Search and discovery of network incidents (port scanning, attacks, exceeding limits or faulty network configurations) becomes automated. Some alerts can be generated by the NetFlow agent itself (record memory overflow can indicate a DoS attack), but thorough analysis of NetFlow data runs on a collector, most often on a fixed interval (5 minutes).

Automated searching in NetFlow data for suspect activities is usually based on exceeding limits for allowed deviation from normal traffic. This means that the search method first starts to learn after it is deployed, and later searches for unusual network traffic behaviour.

The second approach is searching known behaviour patterns for certain anomalies. This could be used when exposing network scans for instance. Such an activity will appear as a line segment in a four-dimensional space built upon source IP address, destination IP address, source port, and destination port (we draw a point in this space for each NetFlow record). By finding all such line segments we also find all scanning activities during a given interval.

The agents themselves are developing, and they are modified to deliver the best quality data about current network traffic. One example is embedding an application decoder into the agent monitoring process. Its task is to locate the application which generated the data transferred. This is simple for applications which use known ports. Unfortunately, if another application uses the same ports (popular port 80 for HTTP traffic) or unassigned ports, then it is misidentified or not identified at all. Blocked applications may exploit this and hide their traffic from the firewall or monitoring device behind traffic of another application. If we expand the NetFlow monitoring process to include searching in the package contents before the contents is thrown away, it is possible to more precisely identify the communicating applications (a pattern to detect an ssh connection looks like this: ^ssh- [12].[0-9]). If an application has no significant pattern or it encrypts its traffic, then pattern detection is rendered useless. As an alternative, feature analysis (statistical indexes) of such flows needs to be used. Data measured include maximum, minimum, variance and average packet length, and the same indexes are measured for intervals between packets of each flow. This can give you a behaviour fingerprint that you can use to estimate the type of application. For example, interactive voice communication would have the following fingerprint: regular intervals between packets up to 150 ms, low data volume, interval at least 10 seconds, and the same pattern in the other direction.

Statistics about intra-packet intervals are very valuable information not only to detect the application, but they can also be used to measure QoS by determinig uneven delay between packets (so-called jitter).

The future of devices that measure flows lies in extending the monitoring statistics and implementing pattern detection in the packet contents. This will add valuable information to data reported by agents, which will allow related methods such as locating suspect traffic or measuring line quality to perform better.

11 Conclusion

Detailed network monitoring is becoming more important nowadays because the amount of illegal activities increases every year. The attackers become professional and making money is their motivation. It is useful to recall that attackers keep adapting to new challenges, they change types of attacks and try to mask behind legitimate traffic.

From this perspective, flow monitoring appears to be a robust and promising method which makes automated search and differentiation of network incidents possible. Moreover, the trend is to expand the functionality of agents and collectors (such as Cisco MARS security system) to provide reliable sources of data about network traffic.