Topology Management Documentation

ToMaTo parts

Hostmanager Documentation

Contents:

Hostmanager Installation

The hostmanager offers the virtualization technology found on the host to backends. For simple installation of the hostmanager, it has been packaged for Debian systems.

Installation on Debian systems

All of the following commands must be issued as root.

  1. Adding the repository

ToMaTo has its own repository for debian packages that needs to be added in order to install its packages.

echo 'deb http://packages.tomato-lab.org/deb stable main' > /etc/apt/sources.list.d/tomato.list
  1. Accepting the repository key

Since all packages are signed with keys, the repository key must be accepted once, otherwise the Debian package manager will complain on every update that the package is unauthorized.

wget http://packages.tomato-lab.org/key.gpg -O - | apt-key add -
  1. Updating the package lists
apt-get update
  1. Installing the Hostmanager package
apt-get install tomato-hostmanager

During the configuration phase of this package, dialogs will appear and propmt for information. All of these prompts can be answered by pressing enter.

  1. Optional: Install the Updater package
apt-get install tomato-updater

This package will add a cronjob that keeps your ToMaTo installation automatically up-to-date.

Installation on Proxmox systems

The target platform for the hostmanager is Proxmox VE so there exists a meta-package specifically for Proxmox systems, that installs all additional software that is needed to use the full potential of Proxmox systems.

To install the hostmanager on Proxmox systems, the steps 1 to 4 from above have to be executed. Additionally the package tomato-host-proxmox has to be installed:

apt-get install tomato-host-proxmox
After the installation

Some steps are needed to finalize the installation:

Hostmanager Configuration

Hostmanager API

The hostmanager offers the following methods via an XML-RPC interface. The interface uses an encoding as documented in XML-RPC Interface. All of the methods can be called by their method names without modules, etc.

Elements
Connections
Accounting
Resources
Host

Backend management

The hostmanager allows different backends to use it.

Component separation

All components (i.e. elements and connections) of different backends are separated by the hostmanager. Each component will have an owner attribute, that references the backend that created it. The component will only be visible and accessible by that backend.

Resource separation

All resources (i.e. networks and templates) of different backends are separated by the hostmanager. Each resource will have an owner attribute, that references the backend that created it. The resource will only be visible and accessible by that backend.

Access control

The authentication of backends uses SSL keys. Each backend has a key of its own and uses it to authenticate and encrypt connections to hostmanagers. On the side of the hostmanager all backend keys must be present as files in PEM format in a specific directory (/etc/tomato/client_certs in default config).

Indexing the backend keys

After modifying the SSL keys, the certificate index must be rebuilt.

update-tomato-client-certs

Note that the hostmanager does not have to be restarted after rebuilding the index. Also note that the hostmanager will issue the command to rebuild the certificate index automatically when it is starting.

Backend identification

The identity of a backend is based on the common name (CN) in its certificate. Different certificates with the same common name field will be treated as the same backend and share access to components.

Generating a key-pair

A self-signed key-pair can be created with the following command:

openssl req -new -x509 -days 1000 -nodes -out key.pem -keyout cert.pem

It is important to create a key without a password if the the key should be used for a backend.

Integrated Fileserver

The backend has an integrated fileserver that allows ToMaTo users to upload and download data files like disk images, packet capture files directly from/to the hostmanager without indirection via the backend.

Access

The fileserver uses the HTTP protocol on a port set in the config file. The public address of the host and the fileserver port can obtained using the API call hostmanager.tomato.api.host.host_info().

Element types

OpenVZ (openvz, openvz_interface)
KVM with QM frontend (kvmqm, kvmqm_interface)
KVM with VirSH frontend (kvm)
Repy (repy, repy_interface)
Tinc VPN (tinc)
UDP Tunnel (udp_tunnel)

Connection types

Bridge (bridge)
Fixed bridge (fixed_bridge)

Other topics:

XML-RPC Interface

The interface is an RPC interface, i.e. it offers a set of methods that can be called (with parameters) and that return a return value. The internal data format is XML but this should be transparent. The interface adheres the XML-RPC standard with the following modifications:

  • It uses an extension for null-value encoding that is not part of the standard. This extension is part of many implementations since the absence of the feature is seen as a flaw in the standard.
  • It uses a special parameter encoding that allows for keyword arguments. If exactly two parameters are given, where the first one is a list and the second one is key/value-map the keyword mode is used. In the keyword mode, the first parameter (the list) is expanded and used as normal positional arguments nad the second argument (the key/value-map) is expanded and used as the keyword arguments.

Web-Frontend Documentation

Command-Line-Client Documentation

Basic CLI functionality

cli.tomato.getLocals(api)[source]

Combines the api with additional functionalities in one dictionary. It adds a list of all commands, a help method and a method to load and initialize python modules from a file.

Parameter api:
Connection to a host api
Return value:
This method returns a dictionary with a connection to a API, an help method and a method to load and initialize python modules from a file.
cli.tomato.parseArgs()[source]

Defines required and optional arguments for the cli and parses them out of sys.argv.

Available Arguments are:
Argument –help:
Prints a help text for the available arguments
Argument –url:
The whole URL of the server
Argument –protocol:
Protocol of the server
Argument –hostname:
Address of the host of the server
Argument –port:
Port of the host server
Argument –ssl:
Whether to use ssl or not
Argument –client_cert:
Path to the ssl certificate of the client
Argument –username:
The username to use for login
Argument –password:
The password to user for login
Argument –file:
Path to a file to execute
Argument arguments
Python code to execute directly
Return value:
Parsed command-line arguments
cli.tomato.run()[source]

Parses the command-line arguments, opens an API connection and creates access to the available commands of the host. It decides based on the options whether to directly execute python code or to execute a file or to grant access to the interactive cli.

cli.tomato.runFile(locals, file, options)[source]

Opens a connection to a remote socket at address (host, port) and closes it to open the TCP port.

Parameter locals:
Dict containing a connection to an API, a help function and a file load function.
Parameter file:
Path to the file which should be executed
Parameter options:
Command-line arguments which will be used to create an interactive console which executes the file.
cli.tomato.runInteractive(locals)[source]

Creates a interactive console based on the local available methods.

Parameter locals:
Dict containing a connection to an API, a help function and a file load function.
cli.tomato.runSource(locals, source)[source]

Executes a python code using an interpreter based on the methods provided by the API found in locals.

Parameter locals:
Dict containing a connection to an API, a help function and a file load function.
Parameter source:
Source code to execute

Additional functionality

Upload / Download commands
cli.lib.createUrl(protocol, hostname, port, username=None, password=None)[source]

Creates a URL for connecting to a server.

Parameter protocol:
Protocol of the server
Parameter hostname:
Address of the host of the server
Parameter port:
Port of the host server
Parameter ssl:
Boolean whether ssl should be used or not
Parameter username:
The username to use for login
Parameter password:
The password to user for login
Return value:
This method returns a full server URL.
cli.lib.getConnection(url, sslCert=None)[source]

Creates a server proxy to a host using the given URL.

Parameter url:
URL of the server
Parameter sslCert:
SSL certificate to use for a ssl connection
Return value:
This method returns a server proxy object.
cli.lib.tcpPortOpen(host, port)[source]

Opens a connection to a remote socket at address (host, port) and closes it to open the TCP port.

Parameter host:
Host address of the socket
Parameter port:
TCP port that will be opened
Return value:
This method returns a boolean which is true, if the TCP Port is open and false otherwise.
Advanced link commands
cli.lib.misc.is_superset(obj1, obj2, path='')[source]

Checks whether obj1 is a superset of obj2.

Parameter obj1:
Superset object
Parameter obj2:
Subset object
Parameter path:
Should be “” in initial call
Return value:
Returns a tuple with 2 arguments. The first argument is a boolean which is true, if obj1 is superset of obj2, false otherwise. The second argument returns a string if the first argument is false. The string contains the reason why obj1 is not a superset of obj2.

Checks the availability of a link by trying to reach him a certain number of tries.

Parameter id:
ID of device which should be reached
Parameter ip:
IP address of the ping target
Parameter tries:
Number of tries
Parameter waitBetween:
Time between each try
Return value:
Returns a boolean which is true, if the link was available within the number of tries, false otherwise.

Configures an link by modifying the certain attributes

Parameter top:
Topology in which the link can be found
Parameter con:
Link which should be modified
Parameter c:
Target interface
Parameter attrs:
Key value pair of attributes which should be configured

Pings a target IP address from a certain device and returns the results. The number of samples and the maximum wait time for responds can be set. Also a one-way adaption of the results is possible.

Parameter id:
ID of device which should be used.
Parameter ip:
IP address of the ping target.
Parameter samples:
Number of messages to send.
Parameter maxWait:
Time to wait for a responds in seconds.
Parameter oneWayAdapt:
Change results to a one-way adaption.
Return value:
The return value of this method is a dict containing information about the route between the link and the destination.
lossratio
The loss ratio of the route between the link and the destination.
delay
The average round-trip time.
delay_stddev
The average standard deviation for the delay.

Other topics

Packet Capturing

Packet capturing can help to trace packages through the network and analyze communication streams.

ToMaTo capabilities

ToMaTo supports capturing of packets on connections on Tinc-based connectors. The capturing can be enabled in the graphical editor in the properties panels of the connections. The captured packets are saved to a rotating set of files holding at most 50 MB of data. The capture files can be downloaded by clicking the “download capture” button in the control panel of the connection.

The timestamp in the capture files do not exactly correspond with the time of sending the packet in the virtual machine since the scheduling might introduce a delay. However the timestamp is guaranteed to be between the time of sending and the time of the forwarding to the connection.

Also note that timestamps from different hosts might have a certain offset, depending on how good the clocks of the hosts are synchronized. In the German-Lab testbed currently no actions are taken to synchronize the clocks among the hosts.

Analysis programs

ToMaTo generates capture files in the pcap format. When downloaded from the hosts multiple capture files are packed into a tar.gz archive.

The capture files created by ToMaTo can be used by a lot different programs:

  • Wireshark - a graphical pcap explorer an analysis tool
  • Cloudshark - a web-based pcp explorer with a similar UI to Wireshark
  • tcpreplay - a Linux tool to replay pcap files

Device Templates

Template distribution

The device templates are distributed using the bittorrent protocol. This way the templates can be distributed among the hosts without a noteworthy central component.

For the bittorrent disctribution the backend and all hosts run a bittorrent client that automatically downloads and uploads the contents of torrent files in a certain directory. The backend also runs a so called bittorrent tracker, i.e. a central registry for the bittrorrent protocol that keeps track of all available peers for a torrent file.

The backend periodically checks that all templates are known to all hosts and are up-to-date. Otherwise the backend will create resource entries for the missing templates containing the torrent information. Since the torrent information has a size of several KiB (depending on the content size) the host will include an MD5 hash in its information and the backend will only update the torrent information when it does not match the hash.

The host will periodically check the file size of the templates and compare them to the information given in the torrent file to determine if the download has finished.

Template setup

To make them easier for users, templates should follow some common priciples.

  1. Templates should be secure by default. This means that by default templates should only run services that are essential to the function of the template. For Linux templates that means that the SSH server will be deactivated by default and has to be activated manually by the user.
  2. Templates should only contain needed modifications. This means that a template should match the default version of the operating system except where adaptations are needed. This should help users that are already familiar with the operating system to use a template of that OS.
  3. Templates should be as small as possible. This means that templates will be compressed to save space and only include useful software. For some operating systems this might mean to remove some drivers and software that will never be used, to save space.
  4. Templates should be international but work in Germany. The language for all templates should be set to american english but the keyboard layout should be set to german.
  5. Templates should be self-explaining and helpful. That means that templates should contain some documentation on their special features and how to use them.
  6. Templates should not assume internet access. Without internet access the templates should still work unless they explicitly require an external service in the internet.
  7. Templates should use DHCP. All existing interfaces should be configured using DHCP. Hostname, DNS and time servers should also be used if included in the DHCP offer.
  8. Templates should require no login for local users and use a default password. This means that local users (via VNC) should be loggen in directly without entering a username or password. If a password is needed for some actions, the password should be the same for all templates. Note that templates still must be secured against the network and require passwords for non-local login.
  9. Templates should include useful tools. Not all devices will have internet access so templates should already include the most useful tools that users want installed. There is a clear trade-off between keeping a template small and including useful tools.
  10. Templates should be updated regularely. This is important in two cases:
    1. If the device has internet access, it is important that the template is up-to-date so that is initially secure. After device preparation, the user will have the responsibility to keep the system updated but it should be secure to start with.
    2. If the device does not have internet access, it is important that the template is not outdated because the user can not easily update it.

Template generation

Scripts that can help to create and clean up templates can be found in the repository in the directory contrib. The scripts create_kvm_template.sh and create_openvz_template.sh can be used to create templates for debian-based systems in a semi-automatic way. The script prepare_vm.sh can be used to adapt a running system to be a proper template.

Torrent creation

Torrent files for templates can be created using the command

btmakemetafile TRACKERURL FILENAME

where TRACKERURL is the URL of the tracker and FILENAME is the name of the template file. The result will be a torrent file, that is named like the template file with .torrent appended.

Note

The ToMaTo backend includes a tracker that can be used for template torrents. Its URL can be determined by the backend API call backend.tomato.api.host.server_info().

Accounting data types

Usage record

A usage record represents a data set with usage statistics of a certain time range. Each usage record has the following fields:

type:
The type of a usage record describes the time frame which the record covers. Possible values are single for a single measurement, 5minutes, hour, day, month and year for aggregated values.
begin and end:
The fields begin and end describe the covered time of the measurement. For a single measurement these fields specify the begin and the end of the measurement execution for this single data point. For all other record types these fields specify the time frame of aggregated measurements. Both fields are timestamps in the form if seconds since the epoch (1970-01-01 00:00:00).
measurements:
The field measurements contains the number of single data points that are combined in the record. This field is 1 for all single records and contains the number of aggregated single records for all other types.
usage:

This field describes the resource usage during the given time period. For single records this time period is the period between the measurement and the last single measurement. For all other types, the time period is the combined time period of the aggregated single records. The field usage is a dict containing the following fields:

cputime:
This field contains the used CPU time in seconds as a float value. CPU time is automatically measured by the operating system, so the measurement results do not depend in measurement timing. Even bigger gaps in measurement do not cause inaccurate values. Note that CPU time is calculated per core, so it is possible to consume several seconds of CPU time during one second.
memory:
This field contains the used memory (RAM) in bytes. Memory consumption is measured on certain measurement points and the different data points are averaged into aggregated values.
diskspace:
This field contains the used disk space in bytes. The measurement is similar to that of the memory field.
traffic:
This field contains the traffic volume in bytes. Like cputime, traffic is automatically measured by the operating system and the values are very accurate because of this.

Note that because of different nature of the resources cputime and traffic are summed up during aggregation while memory and diskspace are averaged.

Usage statistics

The usage statistics data structure contains a set of usage records. Usage statistics objects are dict structures that contain the types of the usage records as keys and a list of usage records with that type as value.

Example

{
  "5minutes": [],
  "hour": [],
  "month": [],
  "single": [
    {
      "usage": {
        "traffic": 0.0,
        "cputime": 0.0,
        "diskspace": 19285.0,
        "memory": 0.0
      },
      "type": "single",
      "begin": 1351241166.88561,
      "measurements": 1,
      "end": 1351241166.89418
    },
    {
      "usage": {
        "traffic": 0.0,
        "cputime": 0.0,
        "diskspace": 19285.0,
        "memory": 0.0
      },
      "type": "single",
      "begin": 1351241106.80326,
      "measurements": 1,
      "end": 1351241106.81239
    },
    {
      "usage": {
        "traffic": 0.0,
        "cputime": 0.0,
        "diskspace": 19285.0,
        "memory": 0.0
      },
      "type": "single",
      "begin": 1351241226.93197,
      "measurements": 1,
      "end": 1351241226.94053
    },
    {
      "usage": {
        "traffic": 0.0,
        "cputime": 0.0,
        "diskspace": 19285.0,
        "memory": 0.0
      },
      "type": "single",
      "begin": 1351241286.97878,
      "measurements": 1,
      "end": 1351241286.98769
    }
  ],
  "year": [],
  "day": []
}

Repy

Repy is a turing-complete subset of Python that allows to run in a sandboxed environment.

Python and Repy

The Python programming language is documented at docs.python.org/reference. Repy is a reduced version of the Python programming language that allows to run scripts in a sandboxed environment. Repy is part of the Seattle Testbed and has an extensive documentation in the Seattle Wiki.

Difference between Repy and Python

  • No imports, no external libraries. The import statement is forbidden in Repy. Some functionality from Python libraries is made available via special identifiers. (see below)
  • No global variables. Instead Repy has a dictionary mycontext that can be used to store global variables.
  • No user input via input or raw_input.
  • Some Python builtins are not available. The most important are
    • print
    • eval and execfile
    • lambda
    • reload
    • reversed and sorted
    • staticmethod
    • super
    • unicode
    • yield
    • hasattr, getattr and setattr
  • Parameters are passed as callargs instead of sys.argv and start with index 0 instead of 1 (sys.argv[0] is the script itself).

Methods available to Repy scripts

Output methods
echo(message)

will print the message (followed by a newline) to the console.

will print an exception and a stack trace to the console.

Threading/Locking methods
createlock()
getthreadname()
createthread()
Misc. methods
exitall()
sleep(time)
randombytes()
getruntime()
getlasterror()
Networking methods
tuntap_read(dev, timeout=None)

will read one packet from the given network device dev and return this packet as a byte string. The method will block until a packet arrives at the device but at most timeout seconds (forever if timeout=None). If no packet has been received before the timeout, None will be returned. It is an error if the device does not exist.

tuntap_read_any(timeout=None)

will read one packet from any network device and return it. The return value will be a tuple (dev, packet) of the incoming device and the packet as a byte string. The first packet that arrives at a network device will be returned. The method will block until a packet arrives at a device but at most timeout seconds (forever if timeout=None). If no packet has been received before the timeout, (None, None) will be returned. It is an error to call this method if no network devices exist.

tuntap_send(dev, data)

will send the packet data via the network device dev. The packet must be a byte string. It is an error if the device does not exist.

tuntap_list()

will return a list of all available network devices.

tuntap_info(dev)

will return a dictionary containing detailed information about the networking device dev.

Struct

The struct library is available via struct (no import needed). This library can be used to encode and decode binary data structures.

ToMaTo library

The tomato library contains implementations of protocols and nodes. This library is extensible, so please feel free to contribute.

Databases

The ToMaTo backend needs a database to store information about hosts, topologies and users. The choice of this database is important for the performance of the ToMaTo backend.

ToMaTo uses Django as a database backend so the Django database documentation applies to ToMaTo as well.

SQLite

Note that SQLite lacks some features of real databases and thus is not suitable for running or developing ToMaTo.

PostgreSQL

PostgreSQL is the database that is used in the German-Lab installation. It is a full-featured database with good performance.

Raising the connection limit

The default database connection limit of PostgreSQL is set to 100 which can be reached by ToMaTo if several users are running a lot tasks in parallel. In the config file postgresql.conf the value max_connections can be raised to allow more concurrent connections. If the postgres server then hits the shared memory limit, the sysctl value kernel.shmmax needs to be increased. (See the PostgreSQl documentation for more details.)

Exporting and importing the ToMaTo data

The manage.py script that comes with the ToMaTo backend can be used to dump and load the database contents in a generic database-agnostic format. These commands might only work when run as user tomato, so sudo -u tomato ./manage.py .... Also note that the commands only work when the database is up-to-date with the current layout in the code. (See migration for details)

Dumping the database

The following command dumps the database to a file named dump.json in the current directory:

$ ./manage.py dumpdata tomato south > dump.json
Loading the data into the database

The following command load a dump from a file named dump.json into the database. (Note that the file extension must be skipped in this command)

$ ./manage.py loaddata dump

Database migrations

As the database layout of ToMaTo changes, the database must be migrated to the new layout. ToMaTo automatically migrates old database schemas to the newest one. (Backups should still be made before the migration.)

Glossary

API
API is short for “Application programming interface”. Both the backend and the hostmanager provide APIs with which they can be used.
Backend
The backend is one of the parts of ToMaTo. It is the central part that manages all resources, hosts, topologies and users. In a ToMaTo-setup the backend is the only part that can only exist once. The backend uses the capabilities of one or more hosts to offer topologies to its users through one or more frontends.
CLI
The command-line interface is a simple way to control both the backend and the hostmanager. It is one of the frontends.
Component
A component is either an element or a connection.
Connection
A connection is a relation between exactly two elements. The connnection can have attributes of its own.
Connector
This is a term from an older version of ToMaTo. Connectors are elements that are network elements.
Device
This is a term from an older version of ToMaTo. A device is an element that is a virtual machine.
Dict
Dicts are key-value mappings in the python programming language. In a dict, each key has a value assigned to it. When used in an API, the keys are limited to strings and the keys are limited to serializable objects (numbers, strings, booleans, None, lists, dicts).
Element
An element refers to virtual objects that the user can control. This includes end systems like virtual machine and scripts as well as networking components like switches, hubs or routers. Each element can have several attributes and child elements (VMs have network interfaces as child elements) and one connection.
Entity
An entity is a very general term for something that is controlled by either the hostmanager or the backend. This includes elements, connections, topologies, users, resources and much more.
Frontend
ToMaTo frontends are a part of ToMaTo. Frontends connect to the backend and give users access to its capabilties by using the backend API and the users credentials. Several frontends exist (e.g. web-frontend and CLI) and can access a backend in parallel.
Host
A host is a physical machine (computer or server) that hosts parts of topologies. Each host is managed by one host manager.
Host manager (or Hostmanager)
The host manager is one of the parts of ToMaTo. It has exclusive control over one host and offers its capabilities to one or more backends.
KVM
KVM devices are heavy-weight virtual machines that emulate a whole computer with generic hardware. Most things that are possible on physical computers is also possible on KVM. Most operating systems run on KVM.
OpenVZ
OpenVZ devices are light-weight virtual machines that translate kernel calls to kernel calls of the host kernel. OpenVZ offers complete usermode access to the virtual machines and a limited kernel-mode access.
Profile
Profiles define the resource boundaries for virtual machine elements. Depending on the VM technology, profiles define different attributes like RAM limit, disk space and number of CPUs.
Repy
Programmable devices are essentially scripts that can work with networking packages. These scripts can be written in a Python dialect called Repy and can read and write raw Ethernet packets to/from their network interfaces. Programmable devices are very light-weight as they are just small Python scripts.
Resource
Resources are a generic entity type for things that are present at hosts and can be used by elements. This includes templates, external networks but also available port numbers.
Template
Templates are pre-installed disk images for virtual machine elements. Depending on the VM technology different templates with different operating systems and software exist. For Repy the template is the actual script that should be executed.
Topology
A topology is a virtual network containing topology components (i.e. elements and connections). For the user, a topology is a virtual world where he can run his experiment.

Missing topics

General

  • Usage calculation
  • Accounts & Providers
  • Logging

Backend

  • API
    • Host
  • Installation
  • Configuration
  • Permissions

Hostmanager

  • API
    • Connections

Frontends

CLI
  • Usage
  • Parameters
  • Examples
Webfrontend
  • Installation
  • Configuration

Indices and tables