[INS-455] Unify common logic in Atlassian Data Center detectors#4907
[INS-455] Unify common logic in Atlassian Data Center detectors#4907mustansir14 wants to merge 5 commits intomainfrom
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 7c19046. Configure here.
| &billomat.Scanner{}, | ||
| &bingsubscriptionkey.Scanner{}, | ||
| &bitbar.Scanner{}, | ||
| &bitbucketdatacenter.Scanner{}, |
There was a problem hiding this comment.
Bitbucket detector silently added to default scanner list
Medium Severity
The bitbucketdatacenter.Scanner{} is being newly registered in the default detector list (both import and scanner instantiation are pure additions with no corresponding removal). The detector previously existed as code but was not included in the default scan engine. This PR's stated goal is to "unify common logic," but silently enabling a previously-unregistered detector in production scans could produce unexpected new findings and has performance implications. The addition to noCloudEndpointDetectors in the test file confirms this is a net-new registration.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 7c19046. Configure here.
There was a problem hiding this comment.
Yes, this was missed in the original PR. We want to enable it. Added a note about this in the PR description as well
|
This is completely irrelevant to the goal of this PR. But can you please add similar logic in |
MuneebUllahKhan222
left a comment
There was a problem hiding this comment.
Couple of changes that need to be addressed, Other than that the PR is good to go.
I don't really think that's a good idea, as this will add more complexity to the PR and will require additional tests. I would say we stick to the goal of this PR and do this in a separate optimizations PR. |
MuneebUllahKhan222
left a comment
There was a problem hiding this comment.
The changes I requested earlier seems to be out of scope for this PR and maybe we can cater them in some other PR. So I am approving this.


Background
Three detectors target Atlassian Data Center (self-hosted) products:
JiraDataCenterPAT,ConfluenceDataCenter, andBitbucketDataCenter. Reviewing them side-by-side revealed significant duplication: identical structural-validation logic, the same bearer-auth HTTP plumbing, and the same URL-extraction pipeline repeated verbatim in each file.What changed
New shared package:
pkg/detectors/atlassiandatacenter/Following the pattern established by
pkg/detectors/aws/, the three detectors are now co-located under a common parent that also houses shared utilities incommon.go.GetDCTokenPat(prefixes []string) *regexp.RegexpReturns the compiled PAT regex for Jira/Confluence DC tokens (44-char base64 strings whose decoded form is
<numeric-id>:<random-bytes>). Previously each detector hand-rolled the same pattern with inline literal strings.IsStructuralPAT(candidate string) boolValidates that a base64 candidate decodes to the
<digits>:<bytes>structure. This function was copy-pasted verbatim between Jira and Confluence.GetURLPat(prefixes []string) *regexp.RegexpReturns the compiled URL regex for self-hosted Atlassian instance URLs. Each detector calls this once at package init time and stores the result in a package-level var, preserving the same compile-once behaviour as before.
go-re2compiles via CGo/FFI so doing this per-chunk would be a meaningful regression.FindEndpoints(data string, urlPat *regexp.Regexp, resolve func(...string) []string) []stringExtracts keyword-scoped URLs from a chunk using the pre-compiled
urlPat, passes them throughs.Endpoints(which merges configured endpoints), deduplicates, and strips trailing slashes. All three detectors had their own multi-step version of this pipeline.MakeVerifyRequest(ctx, client, fullURL, token string) (bool, map[string]any, error)Sends a Bearer-authenticated GET and interprets the response:
200→(true, decoded-JSON-body, nil),401→(false, nil, nil), other →(false, nil, error). Previously each detector built the request inline and contained its own copy of the200 / 401 / defaultstatus-code switch. Jira readsbody["displayName"]andbody["emailAddress"]from the returned map; Confluence and Bitbucket discard it.Important Note
It appears that Bitbucket DC detector was introduced but never registered in
defaults.go. This PR also registers it. With that in mind, and of course to test the other changes as well, I also ran the usual corpora tests we do for new detectors.Detector changes
Each detector now imports the shared package and delegates to it:
isStructuralPAT(Jira + Confluence)atlassiandatacenter.IsStructuralPATatlassiandatacenter.GetDCTokenPat(keywords)atlassiandatacenter.FindEndpoints(...)atlassiandatacenter.MakeVerifyRequest(...)The URL regex is also standardised across all three detectors. Confluence and Bitbucket previously used an unbounded
\d+port pattern and allowed hostnames starting with.or-; all three now use[a-zA-Z0-9][a-zA-Z0-9.\-]*with\d{1,5}for the port (matching the stricter Jira original).A
keywordspackage-level variable is introduced in Jira (Confluence and Bitbucket already had one) so the keyword list is defined once and reused by the regex,FindEndpoints, andKeywords().TestIsStructuralPATmigratedThe structural-PAT test previously lived in
jiradatacenterpat_test.goand called the private function. It is now incommon_test.goand covers tokens from all three products.common_test.goalso adds tests forGetDCTokenPat,FindEndpoints, andMakeVerifyRequest.What did not change
Scannertype,Keywords(),Type(),Description(), andFromData()logic.displayName,emailAddress) is unchanged.invalidHostscache anderrNoHostsentinel are unchanged.?limit=1query parameter is unchanged.Corpora Tests
Checklist:
make test-community)?make lintthis requires golangci-lint)?Note
Medium Risk
Touches secret-detection regexes and verification request handling for Jira/Confluence/Bitbucket Data Center, which could change match/verify behavior (FP/FN) despite being mostly refactoring. Also registers the Bitbucket DC detector in defaults, increasing scan surface area.
Overview
Refactors the Jira/Confluence/Bitbucket Data Center detectors to use a new shared
atlassiandatacenterhelper package for PAT/URL regex construction, structural token validation, endpoint extraction/deduping, and Bearer-auth verification requests.Standardizes URL matching across the detectors and simplifies their verification code paths by delegating status-code handling and optional JSON decoding to
MakeVerifyRequest. Separately, the Bitbucket Data Center detector is now included in the default detector list and engine tests are updated to treat it as having no cloud endpoint, with new unit tests added for the shared helpers.Reviewed by Cursor Bugbot for commit dc94300. Bugbot is set up for automated code reviews on this repo. Configure here.