1. Introduction

1.1. Overview

This Reference Architecture is focussed on OpenStack as the Virtualised Infrastructure Manager (VIM) chosen based on the criteria laid out in the Cloud Infrastructure Reference Model [1] (referred to as “Reference Model” or “RM” in the document). OpenStack [2] has the advantage of being a mature and widely accepted open-source technology; a strong ecosystem of vendors that support it, the OpenInfra Foundation for managing the community, and, most importantly, it is widely deployed by the global operator community for both internal infrastructure and external facing products and services. This means that resources with the right skill sets to support a Cloud Infrastructure (or Network Function Virtualisation Infrastructure, NFVI [3]) are available. Another reason to choose OpenStack is that it has a large active community of vendors and operators, which means that any code or component changes needed to support the Common Telco Cloud Infrastructure requirements can be managed through the existing project communities’ processes to add and validate the required features through well-established mechanisms.

1.1.1. Vision

This Reference Architecture specifies OpenStack based Cloud Infrastructure for hosting NFV workloads, primarily VNFs (Virtual Network Functions). The Reference Architecture document can be used by operators to deploy Anuket conformant infrastructure; hereafter, “conformant” denotes that the resource can satisfy tests conducted to verify conformance with this reference architecture.

1.2. Use Cases

Several NFV use cases are documented in OpenStack. For more examples and details refer to the OpenStack Use cases [4].

Examples include:

  • Overlay networks: The overlay functionality design includes OpenStack Networking in Open vSwitch [5] GRE tunnel mode. In this case, the layer-3 external routers pair with VRRP, and switches pair with an implementation of MLAG to ensure that you do not lose connectivity with the upstream routing infrastructure.

  • Performance tuning: Network level tuning for this workload is minimal. Quality of Service (QoS) applies to these workloads for a middle ground Class Selector depending on existing policies. It is higher than a best effort queue, but lower than an Expedited Forwarding or Assured Forwarding queue. Since this type of application generates larger packets with longer-lived connections, you can optimise bandwidth utilisation for long duration TCP. Normal bandwidth planning applies here with regards to benchmarking a session’s usage multiplied by the expected number of concurrent sessions with overhead.

  • Network functions: are software components that support the exchange of information (data, voice, multi-media) over a system’s network. Some of these workloads tend to consist of a large number of small-sized packets that are short lived, such as DNS queries or SNMP traps. These messages need to arrive quickly and, thus, do not handle packet loss. Network function workloads have requirements that may affect configurations including at the hypervisor level. For an application that generates 10 TCP sessions per user with an average bandwidth of 512 kilobytes per second per flow and expected user count of ten thousand (10,000) concurrent users, the expected bandwidth plan is approximately 4.88 gigabits per second. The supporting network for this type of configuration needs to have a low latency and evenly distributed load across the topology. These types of workload benefit from having services local to the consumers of the service. Thus, use a multi-site approach, as well as, deploying many copies of the application to handle load as close as possible to consumers. Since these applications function independently, they do not warrant running overlays to interconnect tenant networks. Overlays also have the drawback of performing poorly with rapid flow setup and may incur too much overhead with large quantities of small packets and therefore we do not recommend them. QoS is desirable for some workloads to ensure delivery. DNS has a major impact on the load times of other services and needs to be reliable and provide rapid responses. Configure rules in upstream devices to apply a higher-Class Selector to DNS to ensure faster delivery or a better spot in queuing algorithms.

1.3. OpenStack Reference Release

This Reference Architecture document conforms to the OpenStack Wallaby [6] release. While many features and capabilities are conformant with many OpenStack releases, this document will refer to features, capabilities and APIs that are part of the OpenStack Wallaby release. For ease, this Reference Architecture document version can be referred to as “RA-1 OSTK Wallaby.”

1.4. Principles

1.4.1. Architectural principles

This Reference Architecture for OpenStack based Cloud Infrastructure must obey the following set of architectural principles:

  1. Open-source preference: for building Cloud Infrastructure solutions, components and tools, using open-source technology.

  2. Open APIs: to enable interoperability, component substitution, and minimise integration efforts.

  3. Separation of concerns: to promote lifecycle independence of different architectural layers and modules (e.g., disaggregation of software from hardware).

  4. Automated lifecycle management: to minimise the end-to-end lifecycle costs, maintenance downtime (target zero downtime), and errors resulting from manual processes.

  5. Automated scalability: of workloads to minimise costs and operational impacts.

  6. Automated closed loop assurance: for fault resolution, simplification, and cost reduction of cloud operations.

  7. Cloud nativeness: to optimise the utilisation of resources and enable operational efficiencies.

  8. Security compliance: to ensure the architecture follows the industry best security practices and is at all levels compliant to relevant security regulations.

  9. Resilience and Availability: to withstand Single Point of Failure.

1.4.2. OpenStack specific principles

OpenStack considers the following Four Opens essential for success:

  • Open Source

  • Open Design

  • Open Development

  • Open Community

This OpenStack Reference Architecture is organised around the three major Cloud Infrastructure resource types as core services of compute, storage and networking, and a set of shared services of identity management, image management, graphical user interface, orchestration engine, etc.

1.5. Document Organisation

Chapter 2 defines the Reference Architecture requirements and, when appropriate, provides references to where these requirements are addressed in this document. The intent of this document is to address all of the mandatory (“MUST”) requirements and the most useful of the other optional (“SHOULD”) requirements. Chapter 3 and 4 cover the Cloud Infrastructure resources and the core OpenStack services, while the APIs are covered in Chapter 5. Chapter 6 covers the implementation and enforcement of security capabilities and controls. Life Cycle Management of the Cloud Infrastructure and VIM are covered in Chapter 7 with stress on Logging, Monitoring and Analytics (LMA), configuration management and some other operational items. Please note that Chapter 7 is not a replacement for the implementation, configuration and operational documentation that accompanies the different OpenStack distributions. Chapter 8 addresses the conformance. It provides an automated validation mechanism to test the conformance of a deployed cloud infrastructure to this reference architecture. Finally, Chapter 9 identifies certain Gaps that currently exist and plans on howto address them (for example, resources autoscaling).

1.6. Terminology

Abstraction: process of removing concrete, fine-grained or lower-level details or attributes or common properties in the study of systems to focus attention on topics of greater importance or general concepts. It can be the result of decoupling.

Anuket: a LFN open-source project developing open reference infrastructure models, architectures, tools, and programs.

Cloud Infrastructure: a generic term covering NFVI, IaaS and CaaS capabilities - essentially the infrastructure on which a Workload can be executed. NFVI, IaaS and CaaS layers can be built on top of each other. In case of CaaS some cloud infrastructure features (e.g.: HW management or multitenancy) are implemented by using an underlying IaaS layer.

Cloud Infrastructure Hardware Profile: defines the behaviour, capabilities, configuration, and metrics provided by a cloud infrastructure hardware layer resources available for the workloads.

Cloud Infrastructure Profile: the combination of the Cloud Infrastructure Software Profile and the Cloud Infrastructure Hardware Profile that defines the capabilities and configuration of the Cloud Infrastructure resources available for the workloads.

Cloud Infrastructure Software Profile: defines the behaviour, capabilities and metrics provided by a Cloud Infrastructure Software Layer on resources available for the workloads.

Cloud Native Network Function (CNF): a cloud native network function (CNF) is a cloud native application that implements network functionality. A CNF consists of one or more microservices. All layers of a CNF are developed using Cloud Native Principles including immutable infrastructure, declarative APIs, and a “repeatable deployment process”. This definition is derived from the Cloud Native Thinking for Telecommunications Whitepaper, which also includes further detail and examples.

Compute Node: an abstract definition of a server. A compute node can refer to a set of hardware and software that support the VMs or Containers running on it.

Container: a lightweight and portable executable image that contains software and all of its dependencies. OCI defines Container as “An environment for executing processes with configurable isolation and resource limitations. For example, namespaces, resource limits, and mounts are all part of the container environment.” A Container provides operating-system-level virtualisation by abstracting the “user space”. One big difference between Containers and VMs is that unlike VMs, where each VM is self-contained with all the operating systems components are within the VM package, containers “share” the host system’s kernel with other containers.

Container Image: stored instance of a container that holds a set of software needed to run an application.

Core (physical): an independent computer processing unit that can independently execute CPU instructions and is integrated with other cores on a multiprocessor (chip, integrated circuit die). Please note that the multiprocessor chip is also referred to as a CPU that is placed in a socket of a computer motherboard.

CPU Type: a classification of CPUs by features needed for the execution of computer programs; for example, instruction sets, cache size, number of cores.

Decoupling, Loose Coupling: loosely coupled system is one in which each of its components has, or makes use of, little or no knowledge of the implementation details of other separate components. Loose coupling is the opposite of tight coupling

Encapsulation: restricting of direct access to some of an object’s components.

External Network: external networks provide network connectivity for a cloud infrastructure tenant to resources outside of the tenant space.

Fluentd: an open-source data collector for unified logging layer, which allows data collection and consumption for better use and understanding of data. Fluentd is a CNCF graduated project.

Functest: an open-source project part of Anuket LFN project. It addresses functional testing with a collection of state-of-the-art virtual infrastructure test suites, including automatic VNF testing.

Hardware resources: compute/Storage/Network hardware resources on which the cloud infrastructure platform software, virtual machines and containers run on.

Host Profile: is another term for a Cloud Infrastructure Hardware Profile.

Huge pages: physical memory is partitioned and accessed using the basic page unit (in Linux default size of 4 KB). Hugepages, typically 2 MB and 1GB size, allows large amounts of memory to be utilised with reduced overhead. In an NFV environment, huge pages are critical to support large memory pool allocation for data packet buffers. This results in fewer Translation Lookaside Buffers (TLB) lookups, which reduces the virtual to physical pages’ address translations. Without huge pages enabled high TLB miss rates would occur thereby degrading performance.

Hypervisor: a software that abstracts and isolates workloads with their own operating systems from the underlying physical resources. Also known as a virtual machine monitor (VMM).

Instance: is a virtual compute resource, in a known state such as running or suspended, that can be used like a physical server. It can be used to specify VM Instance or Container Instance.

Kibana: an open-source data visualisation system.

Kubernetes: an open-source system for automating deployment, scaling, and management of containerised applications.

Monitoring (Capability): monitoring capabilities are used for the passive observation of workload-specific traffic traversing the Cloud Infrastructure. Note, as with all capabilities, Monitoring may be unavailable or intentionally disabled for security reasons in a given cloud infrastructure instance.

Multi-tenancy: feature where physical, virtual or service resources are allocated in such a way that multiple tenants and their computations and data are isolated from and inaccessible by each other.

Network Function (NF): functional block or application that has well-defined external interfaces and well-defined functional behaviour. Within NFV, a Network Function is implemented in a form of Virtualised NF (VNF) or a Cloud Native NF (CNF).

NFV Orchestrator (NFVO): manages the VNF lifecycle and Cloud Infrastructure resources (supported by the VIM) to ensure an optimised allocation of the necessary resources and connectivity.

Network Function Virtualisation (NFV): the concept of separating network functions from the hardware they run on by using a virtual hardware abstraction layer.

Network Function Virtualisation Infrastructure (NFVI): the totality of all hardware and software components used to build the environment in which a set of virtual applications (VAs) are deployed; also referred to as cloud infrastructure. The NFVI can span across many locations, e.g., places where data centres or edge nodes are operated. The network providing connectivity between these locations is regarded to be part of the cloud infrastructure. NFVI and VNF are the top-level conceptual entities in the scope of Network Function Virtualisation. All other components are sub-entities of these two main entities.

Network Service (NS): composition of Network Function(s) and/or Network Service(s), defined by its functional and behavioural specification, including the service lifecycle.

Open Network Automation Platform (ONAP): a LFN project developing a comprehensive platform for orchestration, management, and automation of network and edge computing services for network operators, cloud providers, and enterprises.

ONAP OpenLab: ONAP community lab.

Open Platform for NFV (OPNFV): a collaborative project under the Linux Foundation. OPNFV is now part of the LFN Anuket project. It aims to implement, test, and deploy tools for conformance and performance of NFV infrastructure.

OPNFV Verification Program (OVP): an open-source, community-led compliance and verification program aiming to demonstrate the readiness and availability of commercial NFV products and services using OPNFV and ONAP components.

Platform: a cloud capabilities type in which the cloud service user can deploy, manage and run customer-created or customer-acquired applications using one or more programming languages and one or more execution environments supported by the cloud service provider. Adapted from ITU-T Y.3500. This includes the physical infrastructure, Operating Systems, virtualisation/containerisation software and other orchestration, security, monitoring/logging and life-cycle management software.

Prometheus: an open-source monitoring and alerting system.

Quota: an imposed upper limit on specific types of resources, usually used to prevent excessive resource consumption by a given consumer (tenant, VM, container).

Resource pool: a logical grouping of cloud infrastructure hardware and software resources. A resource pool can be based on a certain resource type (for example, compute, storage and network) or a combination of resource types. A Cloud Infrastructure resource can be part of none, one or more resource pools.

Simultaneous Multithreading (SMT): simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution on a single core to better utilise the resources provided by modern processor architectures.

Shaker: a distributed data-plane testing tool built for OpenStack.

Software Defined Storage (SDS): an architecture which consists of the storage software that is independent from the underlying storage hardware. The storage access software provides data request interfaces (APIs) and the SDS controller software provides storage access services and networking.

Tenant: cloud service users sharing access to a set of physical and virtual resources, ITU-T Y.3500. Tenants represent an independently manageable logical pool of compute, storage and network resources abstracted from physical hardware.

Tenant Instance: refers to an Instance owned by or dedicated for use by a single Tenant.

Tenant (Internal) Networks: virtual networks that are internal to Tenant Instances.

User: natural person, or entity acting on their behalf, associated with a cloud service customer that uses cloud services. Examples of such entities include devices and applications.

Virtual CPU (vCPU): represents a portion of the host’s computing resources allocated to a virtualised resource, for example, to a virtual machine or a container. One or more vCPUs can be assigned to a virtualised resource.

Virtualised Infrastructure Manager (VIM): responsible for controlling and managing the Network Function Virtualisation Infrastructure (NFVI) compute, storage and network resources.

Virtual Machine (VM): virtualised computation environment that behaves like a physical computer/server. A VM consists of all of the components (processor (CPU), memory, storage, interfaces/ports, etc.) of a physical computer/server. It is created using sizing information or Compute Flavour.

Virtualised Network Function (VNF): a software implementation of a Network Function, capable of running on the Cloud Infrastructure. VNFs are built from one or more VNF Components (VNFC) and, in most cases, the VNFC is hosted on a single VM or Container.

Virtual Compute resource (a.k.a. virtualisation container): partition of a compute node that provides an isolated virtualised computation environment.

Virtual Storage resource: virtualised non-volatile storage allocated to a virtualised computation environment hosting a VNFC.

Virtual Networking resource: routes information among the network interfaces of a virtual compute resource and physical network interfaces, providing the necessary connectivity.

VMTP: a data path performance measurement tool built specifically for OpenStack clouds.

Workload: an application (for example VNF, or CNF) that performs certain task(s) for the users. In the Cloud Infrastructure, these applications run on top of compute resources such as VMs or Containers.

1.7. Abbreviations

Abbreviation/Acronym

Definition

API

Application Programming Interface

BGP VPN

Border gateway Protocol Virtual Private network

CI/CD

Continuous Integration/Continuous Deployment

CNTT

Cloud iNfrastructure Task Force

CPU

Central Processing Unit

DNS

Domain Name System

DPDK

Data Plane Development Kit

DHCP

Dynamic Host Configuration Protocol

ECMP

Equal Cost Multi-Path routing

ETSI

European Telecommunications Standards Institute

FPGA

Field Programmable Gate Array

MB/GB/TB

MegaByte/GigaByte/TeraByte

GPU

Graphics Processing Unit

GRE

Generic Routing Encapsulation

GSM

Global System for Mobile Communications (originally Groupe Spécial Mobile)

GSMA

GSM Association

GSLB

Global Service Load Balancer

GUI

Graphical User Interface

HA

High Availability

HDD

Hard Disk Drive

HTTP

HyperText Transfer Protocol

HW

Hardware

IaaC (also IaC)

Infrastructure as a Code

IaaS

Infrastructure as a Service

ICMP

Internet Control Message Protocol

IMS

IP Multimedia Sub System

IO

Input/Output

IOPS

Input/Output per Second

IPMI

Intelligent Platform Management Interface

KVM

Kernel-based Virtual Machine

LCM

LifeCycle Management

LDAP

Lightweight Directory Access Protocol

LFN

Linux Foundation Networking

LMA

Logging, Monitoring and Analytics

LVM

Logical Volume Management

MANO

Management ANd Orchestration

MLAG

Multi-chassis Link Aggregation Group

NAT

Network Address Translation

NFS

Network File System

NFV

Network Function Virtualisation

NFVI

Network Function Virtualisation Infrastructure

NIC

Network Interface Card

NPU

Numeric Processing Unit

NTP

Network Time Protocol

NUMA

Non-Uniform Memory Access

OAI

Open Air Interface

OS

Operating System

OSTK

OpenStack

OPNFV

Open Platform for NFV

OVS

Open vSwitch

OWASP

Open Web Application Security Project

PCIe

Peripheral Component Interconnect Express

PCI-PT

PCIe PassThrough

PXE

Preboot Execution Environment

QoS

Quality of Service

RA

Reference Architecture

RA-1

Reference Architecture 1 (i.e., Reference Architecture for OpenStack-based Cloud Infrastructure)

RBAC

Role-based Access Control

RBD

RADOS Block Device

REST

Representational state transfer

RI

Reference Implementation

RM

Reference Model

SAST

Static Application Security Testing

SDN

Software Defined Networking

SFC

Service Function Chaining

SG

Security Group

SLA

Service Level Agreement

SMP

Symmetric MultiProcessing

SMT

Simultaneous MultiThreading

SNAT

Source Network Address Translation

SNMP

Simple Network Management Protocol

SR-IOV

Single Root Input Output Virtualisation

SSD

Solid State Drive

SSL

Secure Sockets Layer

SUT

System Under Test

TCP

Transmission Control Protocol

TLS

Transport Layer Security

ToR

Top of Rack

TPM

Trusted Platform Module

UDP

User Data Protocol

VIM

Virtualised Infrastructure Manager

VLAN

Virtual LAN

VM

Virtual Machine

VNF

Virtual Network Function

VRRP

Virtual Router Redundancy Protocol

VTEP

VXLAN Tunnel End Point

VXLAN

Virtual Extensible LAN

WAN

Wide Area Network

ZTA

Zero Trust Architecture

1.8. Conventions

The key words “MUST”, “MUST NOT”, “required”, “SHALL”, SHALL NOT”, “SHOULD”, “SHOULD NOT”, “recommended”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in RFC 2119 [7].