Sidero

Sidero is an implementation of the Kubernetes Cluster API for deploying Kubernetes clusters on Talos, and was deprecated in favour of Omni for reasons that haven't been publicly articulated as far as I know. Speculating: Sidero Labs probably got frustrated with the dreadful design of Cluster API and the constant ping-ponging between different components in order to troubleshoot faults.

Implementation

Components:

  • Bootstrap: Cluster API Bootstrap Provider Talos (CABPT):
    • TalosConfig: describe the configuration elements that will be used to template a Talos machine configuration.
    • TalosConfigTemplate: are similar to TalosConfig, but can be a bootstrapRef for a MachineDeployment, allowing reuse.
  • Control plane: Cluster API Control Plane Provider Talos (CACPPT):
    • TalosControlPlane: describes a control plane node.
  • Infrastructure provider: Cluster API Provider Sidero (CAPS); folded into the sidero-controller-manager:
    • MetalCluster: Sidero's view of the Cluster resource, defines the control plane endpoint. The Machine.spec.infrastructureRef field's target.
    • MetalMachine: Sidero's view of a Machine resource, references a single Server or ServerClass from which a server will be provisioned. When Siderolink is enabled, the .status key provides information about the machine's current state.
    • MetalMachineTemplate: reusable MetalMachine template used for MachineDeployments or TalosControlPlanes that allocate multiple Machines.
    • ServerBindings: 1:1 mappings between Servers and MetalMachines, used internally to keep track of Servers allocated to Clusters for housekeeping operations.
  • Metal Controller Manager:
    • Environment: deployment environments, defining the kernel, kernel arguments and initrd. Architecture-specific.
    • Server: representation of physical machines. Servers are created when the machine PXE boots the Sidero agent and completes the discovery and registration process, where it provides hardware information.
    • ServerClass: can be used to group services based on hardware information.
  • Sidero Controller Manager hosts the metadata service, used to deliver Talos metadata to running nodes.

A very rough architecture diagram:

flowchart TB subgraph bootstrapProvider["Cluster API Bootstrap Provider Talos (CABPT)"] talosConfigCrd["TalosConfig"] talosConfigTemplateCrd["TalosConfigTemplate"] end subgraph controlPlaneProvider["Cluster API Control Plane Provider Talos (CACPPT)"] talosControlPlaneCrd["TalosControlPlane"] end subgraph providerSidero["Cluster API Provider Sidero (CAPS)"] metalClusterCrd["MetalCluster"] metalMachineCrd["MetalMachine"] metalMachineTemplateCrd["MetalMachineTemplate"] serverBindingsCrd["ServerBindings"] end

each machine's Talos machineconfig is stored in a Secrets referenced in the corresponding machine's .spec.bootstrap.dataSecretName field.

Monitoring deployments

  • Hierarchical progress view: watch --color clusterctl describe cluster --show-conditions all management-plane --color
  • Sidero logs: kubectl -n sidero-system logs deploy/sidero-controller-manager --follow. Some interesting containers:
    • DHCP, TFTP and HTTP server logs: --container manager
    • Siderolink logs: --container siderolink
    • Streamed server logs: --container serverlogs
    • Server events: --container serverevents
  • As a last resort, follow along on the node's logs: talosctl --talosconfig management-plane-talosconfig.yaml --endpoints 192.168.51.33 --nodes 192.168.51.33 dmesg --follow

You can follows logs for an individual node as follows:

kubectl logs -n sidero-system deploy/sidero-controller-manager --follow --container serverlogs \
  | jq -r '. | select(.server_uuid == "00d03114-0000-0000-0000-dca632eeb6c0" or .machine == "workload-blue-cp-2kjml") | "\(.["talos-time"]): \(.msg)"

Debugging Talos control planes

The default output for kubectl get taloscontrolplane is a bit crap, yielding only the name with no context as to which Machine is associated. We can peek at the .metadata.ownerReferences field for this:

$ kubectl get talosconfig -o 'custom-columns=Name:{.metadata.name},Machine:{.metadata.ownerReferences[0].name}'
Name                     Machine
workload-blue-cp-47cfd   workload-blue-cp-g6sj4
workload-blue-cp-qh95b   workload-blue-cp-kl2mr
workload-blue-cp-qn77s   workload-blue-cp-wnhzw

Inspecting Talos machine configuration

First, list the TalosConfig and owning Machine objects per Sidero.

Next, list machines:

$ kubectl get machine
NAME                     CLUSTER         NODENAME        PROVIDERID                                      PHASE         AGE   VERSION
workload-blue-cp-g6sj4   workload-blue                   sidero://00d03114-0000-0000-0000-dca632eeb6c0   Provisioned   12h   v1.29.2
workload-blue-cp-kl2mr   workload-blue                   sidero://00d03114-0000-0000-0000-dca632eeb65a   Provisioned   12h   v1.29.2
workload-blue-cp-wnhzw   workload-blue   talos-gmh-9yg   sidero://00d03114-0000-0000-0000-dca632eeb609   Running       12h   v1.29.2

Assuming we're interested in Server/00d03114-0000-0000-0000-dca632eeb6c0, which is provisioned as Machine/workload-blue-cp-g6sj4, we know that our Talos configuration is workload-blue-cp-47cfd:

$ kubectl get talosconfig/workload-blue-cp-47cfd -o yaml
apiVersion: bootstrap.cluster.x-k8s.io/v1alpha3
kind: TalosConfig
metadata:
  creationTimestamp: "2024-03-03T01:30:01Z"
  generation: 1
  labels:
    cluster.x-k8s.io/cluster-name: workload-blue
  name: workload-blue-cp-47cfd
  namespace: default
  ownerReferences:
  - apiVersion: cluster.x-k8s.io/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Machine
    name: workload-blue-cp-g6sj4
    uid: 6ee11d07-f58d-4ccd-add3-86a74e593840
  resourceVersion: "887904"
  uid: 22189c69-2a77-4c36-9407-0f0015a53be7
spec:
  configPatches:
  - op: add
    path: /machine/network
    value:
      interfaces:
      - deviceSelector:
          driver: bcmgenet
          hardwareAddr: dc:a6:32:*
        dhcp: true
        vip:
          ip: 192.168.51.64
      kubespan:
        enabled: true
  - op: replace
    path: /cluster/allowSchedulingOnControlPlanes
    value: true
  generateType: controlplane
  hostname: {}
  talosVersion: v1.6.7
status:
  conditions:
  - lastTransitionTime: "2024-03-03T01:30:01Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-03-03T01:30:01Z"
    status: "True"
    type: ClientConfigAvailable
  - lastTransitionTime: "2024-03-03T01:30:01Z"
    status: "True"
    type: DataSecretAvailable
  dataSecretName: workload-blue-cp-g6sj4-bootstrap-data
  observedGeneration: 1
  ready: true
  talosConfig: |
    context: workload-blue
    contexts:
      workload-blue:
        endpoints: []
        ca: [snip]
        crt: [snip]
        key: [snip]

We can try and figure out what the final configuration might look like from .spec.configPatches, or we can use its .status.dataSecretName field, which references a Secrets:

kubectl get secret/workload-blue-cp-g6sj4-bootstrap-data -o jsonpath={.data.value} | base64 -d

References


Backlinks