# Monitoring Stack

[Prometheus](https://prometheus.io/) is an open-source monitoring and alerting tool designed to collect metrics from various sources, store them efficiently, and enable querying and visualization of these metrics for monitoring purposes.

- **Key Features:**
  - Data Collection
  - Metrics Storage
  - Query Language
  - Alerting
  - Service Discovery
  - Visualizations
  - Exporters

The [kube-prometheus-stack](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack/) is pre-configured to gather metrics from all Kubernetes components, making it ideal for cluster monitoring. It provides a standard set of dashboards and alerting rules, many of which originate from the [kubernetes-mixin](https://github.com/kubernetes-monitoring/kubernetes-mixin/) project.

## Components of the kube-prometheus-stack

1. **Prometheus Operator**: Manages Prometheus instances in your cluster.
2. **Grafana**: Visualizes metrics and plots data using dashboards.
3. **Alertmanager**: Configures notifications (e.g., PagerDuty, Slack, email) based on alerts received from the Prometheus server.

### Data Gathering

Prometheus uses a pull model, expecting services to expose a `/metrics` endpoint for scraping. A time series database stores the data points for each metric that Prometheus retrieves.

Grafana facilitates data collection from the Prometheus time series database and allows you to create stunning graphs organized into dashboards. You can also run queries using the PromQL language.

### Alerting

Alerts sent by client programs like the Prometheus server are managed by the [Alertmanager](https://github.com/prometheus/alertmanager/) component. It handles deduplication, grouping, and routing to the appropriate receiver integration, such as email, PagerDuty, or Slack.

### Documentation

For more information, please refer to the official documentation for each component:

- [Prometheus](https://prometheus.io/docs/introduction/overview/): Overview of features and configuration options.
- [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md): Useful information on using the operator.
- [Alertmanager](https://prometheus.io/docs/alerting/latest/alertmanager/): Learn about Alertmanager and its integrations with various notification platforms.

---

## Setup Guide

This guide walks you through deploying Prometheus and Grafana on your E2E Kubernetes cluster with secure HTTPS access via NGINX Ingress and Let's Encrypt certificates.

**Please refer to this document to connect your Kubernetes cluster first:**

* [How to Download kubeconfig.yaml File](/docs/myaccount/kubernetes/#how-to-download-kubeconfigyaml-file)

### Step 1: Install Ingress Controller

The Ingress controller allows external access to services running inside your cluster.

```bash
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --create-namespace
```

After installation, obtain the public IP:

```bash
kubectl get svc -n ingress-nginx
```

> **Note:** The NGINX Ingress Controller is scheduled for retirement in March 2026.
> For new production deployments, consider using the
> [Kubernetes Gateway API](./kubernetes_gateway_api) instead.

### Step 2: Configure DNS Records

Create DNS A records pointing to the Ingress public IP:

| Type | Name | Value |
|------|------|-------|
| A | prometheus | `<Ingress-Controller-External-IP>` |
| A | grafana | `<Ingress-Controller-External-IP>` |

DNS changes may take a few minutes to propagate.

### Step 3: Install Prometheus and Grafana

```bash
kubectl create namespace monitoring
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm install monitoring prometheus-community/kube-prometheus-stack \
  -n monitoring
```

Verify pods are running:

```bash
kubectl get pods -n monitoring
```

### Step 4: Enable SSL (HTTPS)

To secure access, SSL certificates are automatically issued using Let's Encrypt.

**Install cert-manager:**

```bash
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/latest/download/cert-manager.crds.yaml

helm repo add jetstack https://charts.jetstack.io
helm repo update

helm install cert-manager jetstack/cert-manager \
  -n cert-manager \
  --create-namespace
```

Verify:

```bash
kubectl get pods -n cert-manager
```

### Step 5: Create SSL Issuer

Create `cluster-issuer.yaml`:

```yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    email: your-email@yourdomain.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-prod
    solvers:
      - http01:
          ingress:
            class: nginx
```

Apply the configuration:

```bash
kubectl apply -f cluster-issuer.yaml
kubectl get clusterissuer
```

### Step 6: Expose Monitoring Using a Single Ingress

Create one Ingress resource for both Prometheus and Grafana.

Create `monitoring-ingress.yaml`:

```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: monitoring-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  tls:
    - hosts:
        - prometheus.yourdomain.com
        - grafana.yourdomain.com
      secretName: monitoring-tls
  rules:
    - host: prometheus.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: monitoring-kube-prometheus-prometheus
                port:
                  number: 9090
    - host: grafana.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: monitoring-grafana
                port:
                  number: 80
```

```bash
kubectl apply -f monitoring-ingress.yaml
```

Verify the Ingress:

```bash
kubectl get ingress -n monitoring
```

### Step 7: Access Monitoring Services

Once SSL is issued, access the services using your browser:

- **Prometheus:** `https://prometheus.yourdomain.com`
- **Grafana:** `https://grafana.yourdomain.com`

**Grafana default login:**
- **Username:** `admin`
- **Password:** Retrieve using the command below:

```bash
kubectl --namespace monitoring get secrets monitoring-grafana \
  -o jsonpath="{.data.admin-password}" | base64 -d ; echo
```

### Step 8: Import a Sample Dashboard

Grafana provides ready-made dashboards for Kubernetes.

1. Log in to Grafana
2. Select **Dashboards → Import**
3. Enter **Dashboard ID:** `15661`
4. Select **Prometheus** as the data source
5. Click **Import**

You will now see cluster-level metrics including node resource overview, CPU/memory usage, and network traffic.

---

## Advanced Configuration

### Port Forwarding Access (Alternative)

If you prefer to access monitoring without Ingress, use port forwarding:

```bash
# List services
kubectl get svc -n monitoring

# Access Prometheus
kubectl port-forward svc/<prometheus-service-name> -n monitoring 9090:9090
```

Navigate to `http://localhost:9090`. To see discovered targets: `http://localhost:9090/targets`

```bash
# Access Grafana
kubectl port-forward svc/monitoring-grafana 3000:80 -n monitoring
```

Navigate to `http://localhost:3000`.

### Configuring ServiceMonitors for Prometheus

To monitor applications in your cluster, define a `ServiceMonitor` CRD. This custom resource is provided by the Prometheus Operator and allows you to add new services for monitoring.

A typical ServiceMonitor configuration:

```yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
    - port: web
```

- **`spec.selector.matchLabels.app`**: Tells ServiceMonitor which application to monitor, based on a label.
- **`spec.endpoints.port`**: A reference to the port label used by the application that needs monitoring.

The `kube-prometheus-stack` Helm values file contains an `additionalServiceMonitors` section where you can define additional services. Example for NGINX Ingress Controller monitoring:

```yaml
additionalServiceMonitors:
  - name: "ingress-nginx-monitor"
    selector:
      matchLabels:
        app.kubernetes.io/name: ingress-nginx
    namespaceSelector:
      matchNames:
        - ingress-nginx
    endpoints:
      - port: "metrics"
```

After adding services to monitor, upgrade the stack to apply changes:

```bash
helm upgrade monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  -f values.yaml
```

### Tweaking Helm Chart Values

Inspect all available options and default values for the kube-prometheus-stack Helm chart:

```bash
helm show values prometheus-community/kube-prometheus-stack
```

After tweaking the Helm values file (`values.yaml`) according to your needs, apply the changes:

```bash
helm upgrade monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --version 48.6.0 \
  -f values.yaml
```

> **Security Note:** Exposing monitoring services publicly via LoadBalancer is not recommended without proper access controls. Ensure the service is protected using authentication, an ingress controller with TLS, or restrict access to trusted IP ranges. Prefer private networking or VPN-based access where possible.

---

## Upgrading Kubernetes Prometheus Stack

Check available versions on the [`kube-prometheus-stack`](https://github.com/prometheus-community/helm-charts/releases) releases page or on [ArtifactHUB](https://artifacthub.io/packages/helm/prometheus-community/kube-prometheus-stack).

```bash
helm upgrade monitoring prometheus-community/kube-prometheus-stack \
  --version <KUBE_PROMETHEUS_STACK_NEW_VERSION> \
  --namespace monitoring \
  --values <YOUR_HELM_VALUES_FILE>
```

Replace `KUBE_PROMETHEUS_STACK_NEW_VERSION` with the target version and `YOUR_HELM_VALUES_FILE` with your values file path.

For command documentation, see [helm upgrade](https://helm.sh/docs/helm/helm_upgrade/).

Please check the official recommendations for various [upgrade paths](https://github.com/prometheus-community/helm-charts/tree/main/charts/kube-prometheus-stack#upgrading-chart) from an existing release to a new major version.

---

## Uninstalling Kubernetes Prometheus Stack

```bash
helm uninstall monitoring -n monitoring
```

Delete the namespace:

```bash
kubectl delete ns monitoring
```

CRDs created by this chart are not removed by default and should be manually cleaned up:

```bash
kubectl delete crd alertmanagerconfigs.monitoring.coreos.com
kubectl delete crd alertmanagers.monitoring.coreos.com
kubectl delete crd podmonitors.monitoring.coreos.com
kubectl delete crd probes.monitoring.coreos.com
kubectl delete crd prometheusagents.monitoring.coreos.com
kubectl delete crd prometheuses.monitoring.coreos.com
kubectl delete crd prometheusrules.monitoring.coreos.com
kubectl delete crd scrapeconfigs.monitoring.coreos.com
kubectl delete crd servicemonitors.monitoring.coreos.com
kubectl delete crd thanosrulers.monitoring.coreos.com
```


---