must-gather-clean
The workflow can optionally sanitize the collected must-gather output before creating the handoff archive. The sanitizer is openshift/must-gather-clean, a community-supported tool for obfuscating and omitting sensitive data from must-gather directories.
How Users Enable It
The Job Template survey exposes one cleaning control:
| Prompt | Variable | Values | Default |
|---|---|---|---|
| Run must-gather-clean | ocp_must_gather_clean_enabled |
false, true |
false |
When the value is false, the role archives the raw must-gather output.
When the value is true, the role:
- Runs the fixed
oc adm must-gathercollection. - Runs
must-gather-cleanagainst the raw output directory. - Writes the cleaned files to a separate cleaned output directory.
- Keeps
report.yamlin a separate local report directory. - Builds the final handoff archive from the cleaned directory.
Users cannot provide cleaner flags, paths, or config through the survey.
Repository Configuration
Two platform-owned cleaner configs are stored in:
config/must-gather-clean/openshift_default.yaml
config/must-gather-clean/openshift_omit_network.yaml
Both apply the same obfuscation rules. openshift_omit_network.yaml additionally
excludes openshift-sdn pod log files from the cleaned output. Those files are
absent from the handoff archive entirely — not obfuscated.
The role default points to the full config:
ocp_must_gather_clean_config: >-
/../config/must-gather-clean/openshift_default.yaml
To use the network config, set this as a Job Template extra var:
ocp_must_gather_clean_config: /runner/project/config/must-gather-clean/openshift_omit_network.yaml
Do not expose ocp_must_gather_clean_config as a survey field. The config
choice is a platform admin decision. Changes to either config should be
reviewed and versioned in this repository.
Choosing a Config
| Config | SDN pod logs in handoff archive |
|---|---|
openshift_default.yaml |
Yes — obfuscated |
openshift_omit_network.yaml |
No — excluded entirely |
Use openshift_omit_network.yaml only when the investigation focuses on
cluster configuration, operator state, or resources and SDN pod log content
is not required by the support case. If there is any doubt, use
openshift_default.yaml.
Current Behavior
Both configs obfuscate these values in retained files:
| Type | Replacement | Target |
|---|---|---|
IP |
consistent token such as x-ipv4-0000000022-x |
paths and file contents |
MAC |
consistent token such as x-mac-0000000001-x |
paths and file contents |
Domain |
consistent token such as domain0000000001 |
paths and file contents |
Both configs omit Kubernetes resources with these kinds when they are present in the must-gather:
SecretConfigMapCertificateSigningRequestCertificateSigningRequestListMachineConfig
openshift_omit_network.yaml additionally omits:
- openshift-sdn pod log files matching
*/namespaces/openshift-sdn/pods/*/*/*/logs/*.log
Some must-gather collections may not contain Secrets or ConfigMaps before cleaning. In that case, validate cleaning by comparing obfuscated values in retained files instead of expecting omission counts to differ.
Custom Obfuscation
must-gather-clean supports built-in obfuscators for IP addresses, MAC
addresses, and configured domain names. It also supports custom obfuscators
through Keywords and Regex.
The following examples are not enabled in this repository by default. They show
optional upstream must-gather-clean features that platform admins can add to
either config file after review.
Use Keywords when specific known strings should be replaced:
config:
obfuscate:
- type: Keywords
replacement:
internal-cluster-name: cluster-name-redacted
Use Regex when sensitive values follow a predictable pattern:
config:
obfuscate:
- type: Regex
regex: "token-[A-Za-z0-9]+"
The upstream project also supports targeting file contents, file paths, or both.
For path-sensitive values, use target: All or target: FilePath intentionally.
Custom obfuscators should be specific and reviewed carefully because broad
patterns can make the cleaned output less useful for support.
See the upstream configuration reference for the complete schema and examples:
Validation Example
A practical validation is to compare the same retained file in raw and cleaned archives. For example:
cluster-scoped-resources/operator.openshift.io/networks/cluster.yaml
Raw must-gather:
spec:
clusterNetwork:
- cidr: 10.128.0.0/14
Cleaned must-gather:
spec:
clusterNetwork:
- cidr: x-ipv4-0000000022-x/14
This proves the file was retained, the YAML structure stayed usable, and the IP address portion of the CIDR was obfuscated.
Report Handling
must-gather-clean writes a report.yaml file that maps original values to
their replacements. Treat that report as sensitive.
The role writes the report outside the final handoff directory and validates that it is not bundled into the cleaned archive. The normal successful cleanup path removes the local work directory after the archive has been produced and any configured upload has completed.
Runtime Notes
Cleaning can be significantly slower than raw collection because the tool reads and rewrites retained files. In current validation, a raw must-gather completed successfully while a clean-enabled run took substantially longer. Use the survey toggle intentionally when a cleaned artifact is required.
openshift_omit_network.yaml will complete faster than openshift_default.yaml
on must-gathers from SDN-heavy clusters because openshift-sdn pod logs — which
are typically large and IP-dense — are excluded before the obfuscation passes
run. This is a consequence of the reduced scope, not a tuning objective.