Skip to content

[INS-455] Unify common logic in Atlassian Data Center detectors#4907

Open
mustansir14 wants to merge 5 commits intomainfrom
INS-455-Unify-common-code-in-Atlassian-Data-Center-detectors
Open

[INS-455] Unify common logic in Atlassian Data Center detectors#4907
mustansir14 wants to merge 5 commits intomainfrom
INS-455-Unify-common-code-in-Atlassian-Data-Center-detectors

Conversation

@mustansir14
Copy link
Copy Markdown
Contributor

@mustansir14 mustansir14 commented Apr 22, 2026

Background

Three detectors target Atlassian Data Center (self-hosted) products: JiraDataCenterPAT, ConfluenceDataCenter, and BitbucketDataCenter. Reviewing them side-by-side revealed significant duplication: identical structural-validation logic, the same bearer-auth HTTP plumbing, and the same URL-extraction pipeline repeated verbatim in each file.

What changed

New shared package: pkg/detectors/atlassiandatacenter/

Following the pattern established by pkg/detectors/aws/, the three detectors are now co-located under a common parent that also houses shared utilities in common.go.

GetDCTokenPat(prefixes []string) *regexp.Regexp
Returns the compiled PAT regex for Jira/Confluence DC tokens (44-char base64 strings whose decoded form is <numeric-id>:<random-bytes>). Previously each detector hand-rolled the same pattern with inline literal strings.

IsStructuralPAT(candidate string) bool
Validates that a base64 candidate decodes to the <digits>:<bytes> structure. This function was copy-pasted verbatim between Jira and Confluence.

GetURLPat(prefixes []string) *regexp.Regexp
Returns the compiled URL regex for self-hosted Atlassian instance URLs. Each detector calls this once at package init time and stores the result in a package-level var, preserving the same compile-once behaviour as before. go-re2 compiles via CGo/FFI so doing this per-chunk would be a meaningful regression.

FindEndpoints(data string, urlPat *regexp.Regexp, resolve func(...string) []string) []string
Extracts keyword-scoped URLs from a chunk using the pre-compiled urlPat, passes them through s.Endpoints (which merges configured endpoints), deduplicates, and strips trailing slashes. All three detectors had their own multi-step version of this pipeline.

MakeVerifyRequest(ctx, client, fullURL, token string) (bool, map[string]any, error)
Sends a Bearer-authenticated GET and interprets the response: 200(true, decoded-JSON-body, nil), 401(false, nil, nil), other → (false, nil, error). Previously each detector built the request inline and contained its own copy of the 200 / 401 / default status-code switch. Jira reads body["displayName"] and body["emailAddress"] from the returned map; Confluence and Bitbucket discard it.

Important Note

It appears that Bitbucket DC detector was introduced but never registered in defaults.go. This PR also registers it. With that in mind, and of course to test the other changes as well, I also ran the usual corpora tests we do for new detectors.

Detector changes

Each detector now imports the shared package and delegates to it:

Before After
Private isStructuralPAT (Jira + Confluence) atlassiandatacenter.IsStructuralPAT
Inline PAT regex literal (Jira + Confluence) atlassiandatacenter.GetDCTokenPat(keywords)
10–15 line URL-find + dedup + resolve block (all three) atlassiandatacenter.FindEndpoints(...)
Inline request build + status-code switch (all three) atlassiandatacenter.MakeVerifyRequest(...)

The URL regex is also standardised across all three detectors. Confluence and Bitbucket previously used an unbounded \d+ port pattern and allowed hostnames starting with . or -; all three now use [a-zA-Z0-9][a-zA-Z0-9.\-]* with \d{1,5} for the port (matching the stricter Jira original).

A keywords package-level variable is introduced in Jira (Confluence and Bitbucket already had one) so the keyword list is defined once and reused by the regex, FindEndpoints, and Keywords().

TestIsStructuralPAT migrated

The structural-PAT test previously lived in jiradatacenterpat_test.go and called the private function. It is now in common_test.go and covers tokens from all three products. common_test.go also adds tests for GetDCTokenPat, FindEndpoints, and MakeVerifyRequest.

What did not change

  • Each detector retains its own package, Scanner type, Keywords(), Type(), Description(), and FromData() logic.
  • Jira's JSON response parsing (displayName, emailAddress) is unchanged.
  • Confluence's invalidHosts cache and errNoHost sentinel are unchanged.
  • Bitbucket's ?limit=1 query parameter is unchanged.
  • All existing tests pass without modification.

Corpora Tests

image image

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

Note

Medium Risk
Touches secret-detection regexes and verification request handling for Jira/Confluence/Bitbucket Data Center, which could change match/verify behavior (FP/FN) despite being mostly refactoring. Also registers the Bitbucket DC detector in defaults, increasing scan surface area.

Overview
Refactors the Jira/Confluence/Bitbucket Data Center detectors to use a new shared atlassiandatacenter helper package for PAT/URL regex construction, structural token validation, endpoint extraction/deduping, and Bearer-auth verification requests.

Standardizes URL matching across the detectors and simplifies their verification code paths by delegating status-code handling and optional JSON decoding to MakeVerifyRequest. Separately, the Bitbucket Data Center detector is now included in the default detector list and engine tests are updated to treat it as having no cloud endpoint, with new unit tests added for the shared helpers.

Reviewed by Cursor Bugbot for commit dc94300. Bugbot is set up for automated code reviews on this repo. Configure here.

@mustansir14 mustansir14 requested a review from a team April 22, 2026 08:48
@mustansir14 mustansir14 requested a review from a team as a code owner April 22, 2026 08:48
Comment thread pkg/detectors/atlassiandatacenter/common.go Outdated
@mustansir14 mustansir14 requested a review from a team as a code owner April 22, 2026 09:20
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 7c19046. Configure here.

&billomat.Scanner{},
&bingsubscriptionkey.Scanner{},
&bitbar.Scanner{},
&bitbucketdatacenter.Scanner{},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bitbucket detector silently added to default scanner list

Medium Severity

The bitbucketdatacenter.Scanner{} is being newly registered in the default detector list (both import and scanner instantiation are pure additions with no corresponding removal). The detector previously existed as code but was not included in the default scan engine. This PR's stated goal is to "unify common logic," but silently enabling a previously-unregistered detector in production scans could produce unexpected new findings and has performance implications. The addition to noCloudEndpointDetectors in the test file confirms this is a net-new registration.

Additional Locations (2)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 7c19046. Configure here.

Copy link
Copy Markdown
Contributor Author

@mustansir14 mustansir14 Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was missed in the original PR. We want to enable it. Added a note about this in the PR description as well

@MuneebUllahKhan222
Copy link
Copy Markdown
Contributor

This is completely irrelevant to the goal of this PR. But can you please add similar logic in bitbucketdatacenter detector

if !emitted {
	// No reachable URL in context — emit an unverified token-only
	// result and annotate why we couldn't verify.
	results = append(results, detectors.Result{
		DetectorType: detector_typepb.DetectorType_ConfluenceDataCenter,
		Raw:          []byte(token),
		RawV2:        []byte(token),
		ExtraData: map[string]string{
			"verification_note": "no reachable Confluence Data Center URL found in context; token reported unverified",
		},
	})
}

Copy link
Copy Markdown
Contributor

@MuneebUllahKhan222 MuneebUllahKhan222 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of changes that need to be addressed, Other than that the PR is good to go.

@mustansir14
Copy link
Copy Markdown
Contributor Author

mustansir14 commented Apr 23, 2026

This is completely irrelevant to the goal of this PR. But can you please add similar logic in bitbucketdatacenter detector

if !emitted {
	// No reachable URL in context — emit an unverified token-only
	// result and annotate why we couldn't verify.
	results = append(results, detectors.Result{
		DetectorType: detector_typepb.DetectorType_ConfluenceDataCenter,
		Raw:          []byte(token),
		RawV2:        []byte(token),
		ExtraData: map[string]string{
			"verification_note": "no reachable Confluence Data Center URL found in context; token reported unverified",
		},
	})
}

I don't really think that's a good idea, as this will add more complexity to the PR and will require additional tests. I would say we stick to the goal of this PR and do this in a separate optimizations PR.

Copy link
Copy Markdown
Contributor

@MuneebUllahKhan222 MuneebUllahKhan222 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes I requested earlier seems to be out of scope for this PR and maybe we can cater them in some other PR. So I am approving this.

@mustansir14 mustansir14 added review/product-eng Team integrations reviewed, awaiting product-eng review labels Apr 24, 2026
@mustansir14 mustansir14 requested a review from camgunz May 7, 2026 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

review/product-eng Team integrations reviewed, awaiting product-eng review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants