How secret scanning works

Secret scanners should conduct ongoing reconnaissance of unsecured secrets stored as plain text in code repositories as well as configuration, DevOps, and collaboration tools.

Rich DuBose

Apr 18, 2024

Rich DuBose

Secret scanning is crucial for securing an enterprise’s security management lifecycle. Secret scanning helps identify and prevent security threats posed by exposed sensitive information, passwords, API keys, and other credentials.

GitHub’s Octoverse highlights several mediums where sensitive information may be exposed, including code, configuration tools, CI/CD platforms, and communication channels used to collaborate. When discovered bad actors, this type of information can be used to access systems and associated data, resulting in data breaches and other security incidents.

Secret scanning solutions proactively identify potential security threats so they can be remedied before they can be exploited. Scanning solutions search code repositories, commits, configuration tools, and other data sources for sensitive information, passwords and access keys.

Secret scanning is a key component of modern security strategies that helps organizations in several ways, including:

Preventing data breaches: Secret scanning solutions help prevent data breaches by identifying and remediating threats before leaked passwords or API keys can be exploited.
Improving compliance: Many organizations are subject to regulatory frameworks designed to protect sensitive information, including publicly identifiable information (PII).
Protecting reputations: Security incidents can significantly damage an organization's reputation, which affects their ability to conduct business and negatively impacts revenue.
Cost reduction and avoidance: Organizations affected by data breaches or other security incidents incur significant costs. The primary costs associated with security incidents include legal fees, remediation, additional auditing, and lost business. IBM’s Costs of a Data Breach Report estimates that the average cost of a data breach is $4.45 million.

»Approaches to secret scanning

There are multiple programmatic strategies to secret scanning, including:

Regular expression scanning: Regular expressions are simply a sequence of characters that specify a matching pattern in text. Regular expressions scans are useful for evaluating types of sensitive information that often follow a pattern like API keys or access tokens.
Dictionary scanning: The dictionary approach involves using pre-defined data sources of known secrets to identify potential vulnerabilities. The data source may originate from a series of secrets directly entered into the scanning solution or a secrets management platform like HashiCorp Vault. Dictionary scanning focuses on evaluating log files, code repositories, and configuration tools. This approach to scanning is especially effective when a secrets management tool is used as the dictionary’s data source. This approach allows users to understand if a secret is current or no longer in use — something regular expression scanning cannot do.
Hybrid scanning: Hybrid scanning combines multiple evaluation approaches. Using regular expression and dictionary-based scanning together would be an example of hybrid scanning. More scanning approaches increases the effectiveness of secret scanning and detects a broader range of secrets while also raising fewer false positives.

»Common locations for secrets

Secrets can hide in many places, so it’s important to scan for secrets in the most likely locations:

Code: Developers may accidentally hard-code sensitive information, passwords, or API keys directly into their application code or configuration files.
Collaboration tools: Venues like Confluence, JIRA, Slack, and other tools are used for collaboration between DevOps, DevSecOps, Platform Ops, and business partners. During collaboration, secrets and sensitive information may be left on these platforms, which may fall outside the organization’s security policies.
Container images: Container images are a common location to hold hard-coded secrets, which can make containers vulnerable to attackers. When a developer uses base images, especially from a public registry like Docker Hub, they are leveraging external code that should be considered untrusted. These containers may contain hard-coded secrets.
The broader technology stack: A DevOps stack includes a wide range of tools and services, such as repositories, build systems, and deployment pipelines, that may contain secrets that are not properly protected.

»HCP Vault Radar as your secrets scanning solution

HCP Vault Radar, now in limited availability, is an extension to the HashiCorp Vault secrets management platform that conducts ongoing reconnaissance of unsecured secrets stored as plaintext in code repositories, configuration tools, DevOps tools, and collaboration tools.

HCP Vault Radar employs a hybrid scanning approach using both regular expressions and dictionaries to find leaked secrets and sensitive information. This broad set of evaluation techniques makes HCP Vault Radar an effective secrets scanning solution that can significantly reduce your organization’s attack surface and risk of data breach.

HCP Vault Radar focuses on several areas to ensure its effectiveness in secrets discovery.

Developer experience: HCP Vault Radar supports Git-based source control tools like GitHub, GitLab, and BitBucket. It can be automated to conduct scans over code repositories but also supports a developer’s native workflow by scanning commits and pull requests.

Coverage: HCP Vault Radar provides comprehensive coverage of relevant locations where secrets may be found. Supported locations include:

Code repositories
Container images
Configuration files and tools like AWS Parameter Store
Amazon S3
Confluence
File directories
Databases

Wide coverage helps ensure vulnerabilities are identified across all areas of the software supply chain process, and broad integrations ensure that exposures can be remedied within common development workflows.

Accuracy: HCP Vault Radar leverages hybrid approach using a broad array of scanning approaches, including:

Scanning for known patterns
Testing for high entropy
Check for liveness whenever possible
Ignore rules that can help users tune secrets detection

This hybrid secret scanning approach reduces both false positives and false negatives. False positives can lead to wasted time and effort on the exploration and remediation of non-existent issues, while false negatives can leave vulnerabilities undetected and expose the organization to risk.

Monitoring and alerting: HCP Vault Radar provides monitoring and alerting capabilities to enable quick detection and remediation. Real-time alerts and notifications can be configured to fire when vulnerabilities are identified, and can be integrated with existing incident-response workflows.

Prioritization: HCP Vault Radar is a risk-based code security platform that prioritizes evaluation results based on the presence of:

High-risk content in code (secrets in code, PII, and other high risk content)
Risks due to misconfigurations of Git repositories
Non-conformance with access and identity governance best practices

Customization: You can customize HCP Vault Radar’s scanning rules to meet the specific needs of your organization. This includes defining custom rules for identifying and prioritizing the sensitive data it discovers.

Vault Radar works with a variety of environments.

»Getting started

HCP Vault Radar is an exciting new addition to HashiCorp Vault’s secret lifecycle management capabilities that helps enterprises reduce risk associated with credential exposure. Discovery of unmanaged secrets and subsequent remediation workflows further differentiate Vault’s secrets lifecycle management offering by enabling organizations to take a proactive approach to remediation before a data breach occurs.

To learn more, check out these resources: