Gaps, Innovation, and Development
=================================

Introduction to Gaps, Innovation, and Development
-------------------------------------------------

There are functional gaps between the current state of technology available in open source and the requirements of this
Reference Architecture or the Reference Model. This chapter highlights these gaps in detail and provides proposed
solutions. As a result, various “upstream” community projects may be identified and will be targeted for development
efforts.

Gap template
~~~~~~~~~~~~

   **Related requirements:** List the requirement references ``abc.xyz.00`` from RA2 or RM which this gap tries to
   address.

..

   **Baseline project:** Describe against which upstream project the gap exists, for example, *Kubernetes*. If the gap is not
   against any specific upstream project, state "none".

   **Gap description:** Describe which functionality described in the related requirements is currently missing in the
   implementations that you are aware of. Include references to work ongoing in the target project, which may address the gap.

.. Container run-time Interfaces towards NFVI resources
.. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
..
..   (unclear) This is the southbound interface from the container to the infrastructure resources provided by the IaaS provider.
..
..
..
   e.g., network interface type that is presented to a running container.

Multitenancy and workload isolation with Kubernetes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Related requirements:** ``e.man.004``, ``sec.ci.008``, ``sec.wl.005``, ``sec.wl.006``

**Baseline project:** *Kubernetes*

**Gap description:** Today, Kubernetes lacks hard multitenancy capabilities that allow untrusted tenants to share
infrastructure resources. This presents a security problem when operators seek to separate workloads by categorization
or simply production versus non-production. Furthermore, tenant networks need to be segregated, yet still centrally
administered and maintained. Beyond just security, this also presents an operational problem. Trying to deploy too
many CNFs in the same cluster could result in version conflicts, configuration conflicts, and problems with software
lifecycle management. Finally, without proper isolation, there is an increased risk of cascading failures.

**Proposals & Resolution:** Kubernetes is not a single-cluster solution. This has been demonstrated across the
industry from case studies at prominent companies such as :cite:t:`alibaba-blog-twitter`, :cite:t:`youtube-zalando`, and
:cite:t:`cncf-blog-alibaba` to the biannual CNCF survey that finds that the number of clusters being deployed within
an organization is growing. While there are many reasons behind the multicluster paradigm, examining the gap above we
find that a multicluster solution can address many of these problems such as security and software lifecycle management.

Without multitenancy within a cluster, separate clusters must be used to provide adequate separation for CNFs that
require strong isolation. CNFs may need to be separated for various reasons, including different types of workloads based
on their vendors, environments such as production versus non-production, per-categorization, or supporting independent
lifecycles. Having multiple clusters in which to deploy CNFs allows operators to choose similar CNFs together while
segregating those with different lifecycles from each other. CNFs deployed in the same cluster can be upgraded
together to reduce the operational load, while CNFs that require different versions, configurations, and dependencies
can run in separate clusters and be upgraded independently, if necessary.

If running multiple clusters is the only solution to meeting these workload and infrastructure requirements, the
operational burden of this model must also be considered. Running a multitude of clusters at scale could be a massive
operational challenge, if done manually. Any operator considering running Kubernetes at scale should carefully evaluate
their multicluster management strategy, including the management of the applications within those clusters.

Kubernetes as a VM-based VNF orchestrator
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   **Related requirements:** None.

   **Baseline project:** *Kubernetes*, *Kubevirt*

   **Gap description:** Kubernetes and at least one CRI-compliant runtime should support the running of VNFs without
   requiring changes to the VNF's architecture and deployment artifacts.

Native multiple network interfaces on Pods
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   **Related requirements:** Virtual Network Interface Specifications section in Chapter 4 of :cite:t:`refmodel`

   **Baseline project:** *Kubernetes*

   **Gap description:** Kubernetes does not have native support for multiple Pod interfaces. Therefore, a CNI
   multiplexer, such as :cite:t:`github-multus`, is needed to provision multiple interfaces. Implementation of different
   network services for the interfaces, such as Network Policies, Ingress, Egress, or Load Balancers, depends on the feature
   set of the CNI multiplexer and the CNI plugins it uses. It is therefore inconsistent.

   **Status:** There is a :cite:t:`googledocs-kep-multi-network-pod-object` created to support multiple Pod interfaces
   natively.

Dynamic network management
~~~~~~~~~~~~~~~~~~~~~~~~~~

   **Related requirements:** inf.ntw.03 in :ref:`chapters/chapter02:kubernetes architecture requirements`

   **Baseline project:** *Kubernetes*

   **Gap description:** Kubernetes does not have an API for network service management (for example, VPNs). Therefore,
   a CNI plugin, such as :cite:t:`github-multus`, needs to be used to expose APIs for Network services. Alternatively,
   this is done nowadays with Netconf and so on, and integration with SDN controllers, for example, connecting
   individual VPNs, such as L3VPN, to the CNF, on demand.

Control plane efficiency
~~~~~~~~~~~~~~~~~~~~~~~~

   **Related requirements:** None

   **Baseline project:** *Kubernetes*

   **Gap description:** In situations where multiple sites/availability zones exist, an operator may
   choose to run multiple Kubernetes clusters, not only for security/multitenancy reasons but also for fault, resilience,
   latency purposes, and so on. This produces an overhead of Kubernetes Control plane nodes. There should be a way to
   operate multiple clusters more efficiently while still being able to meet the non-functional requirements of the operator,
   such as fault, resilience, latency, and so on.

Interoperability with VRF-based networking
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

   **Related requirements:** None

   **Baseline project:** *Kubernetes*

   **Gap description:** In existing networks, L3 VRFs/VPNs are commonly used for traffic separation (for example, for
   separating L3 VPN for signalling, charging, LI, O&M, and so on). CNFs have to interwork with existing network elements.
   Therefore, a Kubernetes POD will somehow need to be connected to a L3 VPN. Currently, this is only possible via Multus.
   However, typically there is a network orchestration responsibility to connect the network interface to a gateway
   router, where the L3 VPN is terminated. This network orchestration is not taken care of by K8s, nor is there a
   production-grade solution in the open-source space to take care of this.

   .. note::
      With an underlying IaaS, this is possible. However, it introduces a dependency between workload orchestration in K8s
      and infrastructure orchestration in IaaS, which is not desirable.

Hardware topology-aware huge pages
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Related requirements:** ``infra.com.cfg.004`` and ``infra.com.cfg.002`` in the Virtual Compute Profiles section in
Chapter 5 of :cite:t:`refmodel`.

**Baseline project:** *Kubernetes*

**Gap description:** The Memory Manager was added in v1.21 as alpha feature. For details, see
:ref:`chapters/chapter03:management of memory and huge pages resources`.

User namespaces in Kubernetes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Related requirements:** ``e.man.004`` in the Cloud Infrastructure Management Capabilities section in Chapter 4 of
:cite:t:`refmodel`, :ref:`inf.ntw.03 <chapters/chapter02:platform and access requirements>`

**Baseline project:** *Kubernetes*

**Gap description:** Kubernetes does not support namespace scoped user IDs (UIDs). Therefore, when a CNF requires system
privileges, the container either needs to run in privileged mode, or the infrastructure needs to provide random system
UIDs. Randomised UIDs result in errors when the application needs to set kernel capabilities (for example, in the case of
VLAN trunking), or when a Pod shares data with other Pods via persistent storage. The "privileged mode" solution is not secure
while "random UID" solution is error-prone. These techniques should therefore not be used. Support for proper user
namespaces in Kubernetes has been introduced as an alpha feature in Kubernetes 1.25 :cite:t:`kubernetes-user-namespaces`
(relevant KEP :cite:t:`kubernetes-kep-user-namespaces`).