When Convenience Becomes a Backdoor
SecurityArchitectureGoogle CloudPub/SubProduction

When Convenience Becomes a Backdoor

Phuoc NguyenFebruary 10, 202511 min read

When Convenience Becomes a Backdoor

The story starts with a "great" idea

"Hey, I have this great idea. Let's create a Google Chat bot, connect it to Pub/Sub, and from there call into our prod backend to restart services, check logs, clear cache. Super convenient! We can do maintenance from anywhere, no VPN needed."

Sound familiar?

In a team meeting, a developer proposed this idea with genuine enthusiasm. And to be fair, from a pure productivity perspective, it sounds very appealing:

  • No VPN needed - Can restart services from a coffee shop
  • No company laptop required - Just chat from your phone
  • No complex commands to remember - Just type /restart service-x in chat
  • Real-time notifications - Bot responds immediately when done

Who doesn't love convenience?

But there's a problem: what developers call "convenience," security engineers call a "backdoor".

Security
Security

Calling things by their true name

Let's draw out the architecture the developer proposed:

Developer (anywhere, any network)
  → Google Chat (public internet)
    → Google Pub/Sub
      → Production Backend (executes maintenance commands)

Now, compare with the architecture of a C2 (Command & Control) channel - what hackers build after compromising a system:

Attacker (anywhere, any network)
  → Public Cloud Service (to bypass firewall)
    → Message Queue / Webhook
      → Compromised Backend (executes malicious commands)

Look similar?

In 2024, APT (Advanced Persistent Threat) groups like APT41 were discovered abusing exactly Google Cloud services - including Pub/Sub - to build C2 channels. The reason? Traffic to *.googleapis.com is usually whitelisted and not inspected.

The only difference between a "maintenance bot" and a "C2 channel"? Who built it. But from a security perspective, they have the same attack surface.

Why "works from anywhere" is a red flag

Developer says: "Convenient because I can use it from anywhere!"

Security engineer hears: "Attacker can attack from anywhere."

Let's analyze this deeper.

Bypasses entire security perimeter

A serious company's production backend is typically protected by multiple layers:

  1. Network perimeter - Firewall, VPC, private subnets
  2. Access control - VPN, bastion host, IP whitelist
  3. Identity verification - MFA, certificate-based auth
  4. Device compliance - Only allows enrolled MDM devices
  5. Session management - Timeout, recording, audit

The chatbot via Pub/Sub? Bypasses all of them.

A message from Google Chat goes through Google's infrastructure, not your firewall. It doesn't need VPN, doesn't need IP whitelist, doesn't need device compliance. As long as it has permission to publish to the Pub/Sub topic, the message goes straight to production.

Authentication based on... Google Chat membership?

Who has permission to send commands to production?

With VPN + bastion: Someone with VPN credentials + SSH key + whitelisted IP + passed MFA.

With chatbot: Anyone added to that Google Chat space.

A new intern? Added to the space to "learn," now has permission to restart production. A temporary contractor? Still in the space after their contract ended. A developer whose Google Workspace account was compromised? Attacker has a direct path to prod.

No granular authorization

A properly designed maintenance REST API can control:

  • User A can only restart service X
  • User B can only view logs, not modify
  • User C can only operate on staging
  • Any production action requires approval from 2 people

With chatbot? Usually all-or-nothing. Everyone in the space can type the same commands, with the same permissions.

Access Control
Access Control

Real attack scenarios

Don't think this is theoretical. Here's a completely plausible scenario:

Scenario 1: Compromised Developer Account

8:00 AM - Developer Mike receives a phishing email that looks like a notification from Google Workspace: "Unusual sign-in detected. Click here to verify."

8:02 AM - Mike clicks the link, enters credentials. This is a fake page. Attacker has username/password.

8:05 AM - Attacker logs into Mike's Google Workspace. No separate MFA for Google Chat.

8:10 AM - Attacker opens Google Chat, sees the "Production Maintenance" space. Types /help.

8:11 AM - Bot responds with a list of commands: /restart, /logs, /db-query, /clear-cache, /deploy...

8:15 AM - Attacker: /db-query SELECT * FROM users LIMIT 1000

8:16 AM - Bot returns 1000 user records with email, phone, hashed passwords.

8:20 AM - Attacker: /deploy malicious-service:latest

8:21 AM - Malware deployed to production.

Total time from phishing to full production access: 21 minutes.

No zero-day exploit needed. No firewall bypass needed. No lateral movement needed. Just one click on a phishing link.

Scenario 2: Insider Threat

Monday - Developer James receives notice of layoff, effective Friday.

Tuesday - James still has access to everything, including the maintenance chat space.

Wednesday - James starts "backing up personal data." Actually exporting data via the chatbot.

Thursday - James types /db-query SELECT * FROM transactions WHERE amount > 1000000

Friday - James leaves the company with a database dump containing all large transactions.

With proper access control, James's permissions would have been revoked immediately upon the layoff decision. With chatbot? As long as he's still in the Google Chat space, he still has access.

Scenario 3: Social Engineering

Attacker calls IT support: "I'm Lisa from the Platform team, just joined last week. Can you add me to the Production Maintenance space? My ID is lisa.nguyen@company.com"

IT support checks: There is indeed a Lisa Nguyen who joined last week (public information on LinkedIn).

IT support adds lisa.nguyen@company.com to the space. But that's not Lisa's company email - that's an email the attacker created with a similar name.

Attacker now has production access.

Specific technical vulnerabilities

Beyond process issues, there are specific technical risks:

1. Google Pub/Sub Supply Chain Vulnerabilities

According to research from GitHub Security Advisory and Snyk, Google Pub/Sub client libraries have a history of vulnerabilities:

CVESeverityIssue
CVE-2020-7720Critical (9.8)Prototype Pollution in node-forge (dependency chain)
GHSA-7v5v-9h63-cj86Medium (6.9)DoS in @grpc/grpc-js
Multiple other CVEsVariesSignature forgery, Prototype Pollution in dependencies

Even the Java client google-cloud-pubsub:1.108.1 contains over 14 CVEs from dependencies.

This means: even if you don't make configuration mistakes, supply chain attacks are still a real risk.

2. Traffic Invisible to Security Tools

According to reports from Intel 471 and Active Countermeasures:

"Traffic to `.googleapis.com` is typically whitelisted in corporate firewalls and not subject to DPI (Deep Packet Inspection). This makes Pub/Sub an ideal C2 channel for threat actors."*

Your security team may be monitoring all traffic in and out, but completely blind to traffic via Google Cloud services.

3. Message Queue = Delayed Execution = Evasion

With REST API, requests are processed immediately. Security tools can correlate request with response, detect anomalies in real-time.

With Pub/Sub, messages can be queued, processed later. This allows:

  • Time-based evasion: Send commands at 3 AM when no one's monitoring
  • Batch evasion: Send many small commands instead of one large easily-detected command
  • Retry exploitation: If a command fails, Pub/Sub automatically retries

4. No Input Validation Layer

Chat messages are free-text. If the bot parses and executes:

User: /db-query SELECT * FROM users; DROP TABLE users; --

If the bot doesn't sanitize input properly → SQL Injection directly into production.

With REST API, you have middleware layers to validate, sanitize, escape. With a chatbot parsing free-text? Everything depends on the bot's code.

Vulnerability
Vulnerability

Context: Cyber attacks in 2024-2025

This isn't theoretical concern. Look at reality:

Alarming numbers

  • 71% of companies experienced at least one ransomware attack in 2024 (Sophos State of Ransomware 2024)
  • Average dwell time (time attacker stays in system before detection): 10 days (Mandiant M-Trends 2024)
  • Phishing remains attack vector #1, accounting for 36% of successful attacks (Verizon DBIR 2024)

Notable attacks involving cloud services

SolarWinds (2020) - Attacker compromised supply chain, planted backdoor in software update. Affected 18,000+ organizations including US Government agencies.

Codecov (2021) - Attacker modified bash uploader script to exfiltrate credentials from CI/CD pipelines. Affected thousands of companies.

Okta (2022) - Attacker compromised support engineer's laptop, from there accessed customer tenants.

CircleCI (2023) - Attacker compromised engineer credentials, stole customer secrets from CI/CD.

Microsoft Storm-0558 (2023) - Attacker stole signing key, forged authentication tokens, accessed US Government email.

Common pattern? Attackers don't hack directly into production. They find the roundabout way - supply chain, third-party services, compromised credentials.

A chatbot connected to production? That's exactly the roundabout way attackers dream of.

The right way to solve the real need

The need to "maintain systems remotely and quickly" is completely legitimate. Developers aren't wrong for wanting convenience. But there are ways to achieve convenience without sacrificing security.

Option 1: VPN + Cloud IAP + PAM

Developer (anywhere)
  → VPN (encrypted tunnel, device compliance check)
    → Cloud IAP (Identity-Aware Proxy, MFA)
      → Bastion Host (session recording, audit)
        → Production (limited commands, time-bound access)

Tools: Google Cloud IAP, HashiCorp Boundary, Teleport, AWS SSM Session Manager

Option 2: GitOps for Maintenance Tasks

Developer (anywhere)
  → Git commit (maintenance task definition)
    → Pull Request (peer review, approval)
      → CI/CD Pipeline (with proper controls)
        → Production (automated, audited execution)

Tools: ArgoCD, Flux, Terraform, Ansible AWX

Option 3: Runbook Automation

Developer (anywhere)
  → Runbook Platform (authn, authz, audit)
    → Pre-defined Actions (limited scope)
      → Production (controlled execution)

Tools: PagerDuty Rundeck, Shoreline, AWS Systems Manager Automation

Option 4: ChatOps - But Done Right

If you really want to use a chat interface, here's how to do it properly:

Safe ChatOps principles:

  1. Bot MUST NOT execute directly - Only trigger existing pipelines
  2. Approval workflow mandatory - Every production action needs at least 1 person to approve
  3. Limited command set - Only pre-defined actions, no arbitrary commands
  4. Internal network only - If using Slack/Chat, enforce device policies
  5. Audit logging - Every action must log to SIEM
  6. Rate limiting - Limit commands per user per hour
  7. Production commands = extra friction - Staging can be easy, production must be harder

Example of proper flow:

Developer: /restart user-service production

Bot: ⚠️ Production action requires approval.
     Reason required. Please reply with: /reason <your reason>

Developer: /reason Memory leak detected, need restart to recover

Bot: 📝 Request logged. Notifying @oncall-lead for approval.
     Request ID: REQ-20240215-001
     Action: restart user-service
     Environment: production
     Requester: dev@company.com
     Reason: Memory leak detected, need restart to recover

Oncall Lead: /approve REQ-20240215-001

Bot: ✅ Approved by oncall@company.com
     Triggering restart via CI/CD pipeline...

Bot: ✅ user-service restarted successfully
     Logs: https://internal-logs.company.com/REQ-20240215-001
     Duration: 45 seconds
     Health check: PASSED

Compare with the wrong way:

Developer: /restart user-service production

Bot: ✅ Done. user-service restarted.

See the difference?

Conclusion: Friction is a Feature

Back to the story at the beginning. Developer wanted to "do maintenance from anywhere, no VPN needed."

In security, friction is a feature, not a bug.

Having to VPN in, SSH through bastion, authenticate MFA, wait for approval - each step is an intentional checkpoint. It forces the person to:

  • Be deliberate and conscious about what they're doing
  • Be in a controlled environment
  • Have time to reconsider before executing
  • Leave a trail for others to review

When you remove those frictions for "convenience," you also remove safeguards that were designed for a reason.

A trustworthy production system isn't one that's easiest to access. It's one that only allows the right people, at the right time, in the right way to access.

Next time someone proposes a "convenient" solution to access production, ask yourself:

"If a threat actor had what this developer has, what could they do?"

The answer to that question is the true level of risk of the solution.


Appendix

A. Pub/Sub Security References

B. Recommended Access Management Tools

ToolUse CaseLink
Google Cloud IAPIdentity-Aware Proxy for SSH/RDPcloud.google.com/iap
HashiCorp BoundarySecure remote accessboundaryproject.io
TeleportZero-trust access platformgoteleport.com
AWS SSM Session ManagerAWS native session managementdocs.aws.amazon.com
RundeckRunbook automationrundeck.com

C. Security Frameworks References

  • NIST Cybersecurity Framework 2.0
  • CIS Controls v8
  • OWASP Top 10 for Cloud
  • MITRE ATT&CK for Enterprise
Share: