PODCAST · technology
Cultivating Security
by Cultivating Security
Deep examinations of industry incidents, vendor risk, and operational security decisions from 25+ years in the field. AI-narrated episodes transform written analysis into practical insights for security professionals who need to understand what really happens when security meets operational reality. No certifications required, just real-world experience.
-
20
Week 14: What I Wish Someone Had Told Me
Twelve weeks ago, we started this series talking about a gap in how people learn security work. You can get certified, read the frameworks, know the technical fundamentals—and still walk into your first real security role completely unprepared for how the work actually functions. Nobody teaches you the organizational part. The political part. The part where your technically perfect solution dies in a budget meeting. The part where you discover that half your environment isn’t documented and everyone just works around it. We’ve spent twelve weeks filling that gap. Talking about the realities that textbooks don’t cover and certifications don’t test for. The organizational dynamics, the political navigation, the pragmatic trade-offs, the gap between theory and practice. These are the things I wish someone had told me earlier in my career. Or maybe they did try to tell me, and I wasn’t ready to hear it yet. Sometimes the lesson doesn’t land until you’ve seen enough to recognize what it means. Some of this I learned through mistakes—my own and others’. Some through painful experience during incidents, failed projects, and organizational friction. Some through watching seasoned practitioners navigate situations I didn’t understand yet. And some through finally understanding what a mentor or manager had been trying to tell me for months, once I had the context to make sense of it. I can’t give you the experience—that you have to earn yourself. Experience is what engrains these lessons in ways that reading never can. But what I can give you is context. Frameworks for understanding what you’re experiencing, so the lessons land faster and with less confusion. Explanations for why your boss, your manager, your lead, your VP does the things they do—not because they’re wrong or don’t care about security, but because they’re operating with constraints and pressures you might not see yet. And maybe—hopefully—this series will help reduce some of the stress. The frustration of proposing good ideas that go nowhere. The confusion when leadership makes decisions that seem obviously wrong. The exhaustion of fighting battles that never seem to end. If you understand the organizational dynamics, the political realities, the resource constraints, the competing priorities—it doesn’t make the work easy, but it makes it make sense. And when it makes sense, you can try different approaches. Different framing. Different timing. Different tactics. You’ll have tools, techniques, and methods to try when your first approach doesn’t get traction. Not because the first approach was wrong, but because you’ll understand why it didn’t work and what might work better given the specific organizational context you’re in. The work is still hard. But understanding why it’s hard—and having strategies for navigating that difficulty—makes it sustainable in ways that just grinding through without context never is. I can’t give you the experience—that you have to earn yourself. But I can give you the framework for understanding what you’re experiencing, so the lessons land faster and hurt less. The Patterns That Kept Appearing If you’ve been following this series from the beginning, you noticed certain themes surfacing repeatedly. That wasn’t accidental. These are the patterns that underpin how security work actually gets done: Understanding before securing. We started with asset inventory and environmental knowledge (Week 1) because you can’t protect what you don’t know exists. But that principle echoed through every subsequent week. You can’t manage risk in systems you haven’t inventoried. You can’t build detection for attacks you have no visibility into. You can’t secure identities you haven’t cataloged. You can’t assess vendor risk without understanding what access they have. You can’t navigate organizational politics if you don’t understand the business context. It all starts with understanding what you’re actually working with. Not what the documentation says. Not what people think is there. What’s actually there. Fort Knox isn’t the goal. We covered this explicitly in Week 2, but it came up again in Week 7 (why security projects fail), Week 10 (when best practices don’t apply), and everywhere we talked about pragmatic trade-offs. Perfect security is impossible and often counterproductive. Your job is managing risk proportionally within realistic constraints, not eliminating all risk. Learning to calibrate your risk tolerance to organizational reality—to distinguish between “this is dangerous and unacceptable” and “this is uncomfortable but pragmatic given our constraints”—that’s professional growth. It’s also what prevents burnout when you realize you can’t achieve the textbook ideal. Documentation is operational, not bureaucratic. I recommended starting a risk register in Week 1, and it kept proving useful throughout the series—critical in Week 2, essential in Week 6, valuable again in Weeks 10 and 11. It’s not compliance theater. It’s how you track what you’ve found, what the organization has accepted, what you’re working on, and what you’re carrying forward. It’s how you demonstrate progress over time. It’s how you protect yourself from inheriting responsibility for decisions made years before you arrived. But it’s also a working tool for prioritization. When you choose a risk assessment methodology—and we didn’t cover that in this series, but you’ll need one—your risk register becomes the map that shows you what to address first, what to tackle next, what’s lower priority but still needs eventual attention. It helps you move the program forward strategically instead of just reacting to whatever’s loudest or most recent. Without it, you’re operating from memory and reacting to pressure. With it, you have a coherent view of your security posture and a rational basis for prioritizing work. When someone asks “what security issues do we have,” you have an answer. When leadership accepts a risk you’re uncomfortable with, you document it and move on. When you’re trying to show improvement year over year, you have the receipts. Incremental progress beats perfect plans. This showed up everywhere—Week 2’s risk management, Week 7’s project failures, Week 10’s pragmatic trade-offs. You won’t fix everything at once. You won’t get unlimited resources. You won’t have perfect organizational support. But consistent, demonstrable improvement over time? That’s achievable. That’s what separates effective security practitioners from people who burn out fighting for impossible standards. Close ten risks this quarter. Close twelve next quarter. That’s progress. The risk register still has 150 items? Sure. But it had 172 six months ago. You’re moving in the right direction. And here’s the thing: not all 150 items are critical. If they are, you probably need to revisit how you rank and categorize risks—your methodology might be inflating everything. More likely, the majority are actually lower risk. They’re still risks you need to treat (compensating controls where possible, documentation where you can’t), but they’re not drop-everything urgent. Look at what you actually closed: this quarter, 12 items—half were critical, 2 were medium, 4 were low. Prior quarter, maybe you closed 10 items—3 critical, 5 medium, 2 low. That’s tangible risk reduction in a short period of time. You’re not just moving items off a list—you’re systematically reducing your organization’s exposure to the risks that actually matter most. The critical findings are getting addressed. The high-severity gaps are closing. The attack surface is shrinking in the areas that count. That’s the kind of progress leadership can understand and you can be proud of. And it’s only visible if you’re tracking it properly. Organizational literacy matters as much as technical skill. Understanding how decisions get made, who influences them, what pressures leadership faces, how to communicate in business terms, when to fight and when to document and move on—these came up in Week 6 (reporting through IT), Week 7 (why projects fail), Week 8 (reading the room), and honestly everywhere. Technical competence gets you in the door. Organizational effectiveness determines whether you can actually get security work done. You can be the smartest security person in the room and still accomplish nothing if you don’t understand how to operate in the organization you’re actually in. Vendor promises and reality rarely align. Week 5 covered this extensively, but it echoed in Week 3’s visibility gaps and Week 4’s identity sprawl. Whether it’s logging capabilities, incident response commitments, authentication mechanisms, or SLA definitions—verify everything, get commitments in writing, plan for failure. This isn’t cynicism. It’s professionalism. Vendors are businesses with business objectives. Their incentives aren’t perfectly aligned with yours. Understanding that dynamic helps you manage the relationship effectively instead of being repeatedly surprised when they don’t deliver what you expected. Compliance and security aren’t the same thing. Week 9 made this explicit, but the tension appeared in Week 6 (using compliance as a forcing function), Week 8 (understanding what your CISO actually cares about), and Week 10 (when best practices don’t apply). Passing an audit doesn’t mean you’re secure. Compliance frameworks test for specific controls at a point in time—they don’t evaluate your comprehensive risk posture. But compliance requirements can be useful. They create forcing functions that get resources allocated. They provide deadlines that security risk assessments often don’t. Use them strategically, but don’t let them define your entire program. Pattern recognition comes from repetition. Week 12 was about learning from public breaches, but the principle applies more broadly. After you see enough incidents, enough vendor failures, enough project dynamics, enough organizational patterns—you start recognizing them earlier. You develop intuition for what’s likely to succeed versus what’s going to struggle. You get better at predicting which risks matter and which ones are theoretical. That intuition can’t be taught directly. But it can be accelerated by deliberately learning from others’ experiences instead of only your own. What We Didn’t Cover (And Why) This series was never meant to be comprehensive. It was focused on a specific gap: the organizational and practical realities of security work that don’t get taught in formal programs. We didn’t do technical deep-dives. How to configure a SIEM. How to write detection rules. How to perform threat modeling. How to implement zero trust architecture. Not because those aren’t important—they absolutely are—but because there are plenty of good resources for learning those things. The gap isn’t technical knowledge. It’s organizational context. We didn’t do tool recommendations or vendor comparisons. Tools change constantly. What’s cutting-edge today is legacy tomorrow. The principles of what you need (visibility, detection, response capability, identity management) don’t change. The specific products that deliver those capabilities change all the time. Besides, the right tool depends on your environment, your constraints, your use cases—there’s no universal answer. We didn’t cover advanced topics like threat hunting, red teaming, sophisticated detection engineering, or security research. Not because those aren’t valuable career paths, but because this series was aimed at people 1-5 years into security work who are still building foundational organizational literacy. Those advanced topics come later, and they build on the foundations we’ve covered here. We didn’t go deep on specific compliance frameworks or regulations. GDPR, HIPAA, PCI-DSS, SOC 2, ISO 27001—the specifics vary, but the organizational dynamics we covered in Week 9 apply broadly. The tension between compliance and security, the way audits work, the importance of documentation, the scope boundaries—those patterns repeat regardless of which framework you’re working with. We didn’t cover career progression, salary negotiation, resume building, interview preparation, or other career development topics. Important? Yes. But outside the scope of “how to actually do security work effectively once you’re in the role.” The focus was deliberate: the organizational, political, and practical realities of security work that usually take a decade of painful experience to learn. Everything else—the technical skills, the tool knowledge, the advanced specializations—you can find elsewhere or will learn as you need them. The Skills That Compound Some skills plateau relatively quickly. You learn a technology, you get proficient, you maintain that proficiency. That’s valuable, but it doesn’t keep growing exponentially. Other skills compound over time. They get more valuable the longer you practice them, and they apply across changing contexts even as technologies and tools evolve. Communication skills compound. Learning to translate technical risk into business impact, to tailor messages for different audiences, to advocate for security work without being preachy or alarmist—these skills improve with practice and apply regardless of what security domain you’re working in or what technologies you’re using. Ten years from now, the specific security tools will be different. The need to communicate effectively with non-technical stakeholders will be exactly the same. Organizational navigation skills compound. Understanding how decisions get made, building relationships with stakeholders, recognizing political dynamics, knowing when to push and when to document and move on—these get easier with experience and apply across different organizations and roles. You’re building pattern recognition for organizational behavior that transfers. Judgment and prioritization skills compound. Learning to distinguish between critical risks and theoretical concerns, knowing what’s worth fighting for and what’s worth accepting, making intelligent trade-offs with imperfect information—this is the skill that separates senior practitioners from junior ones. It takes years to develop because it requires seeing enough situations to build reliable intuition. Systems thinking compounds. Understanding how pieces fit together, recognizing second-order effects, seeing patterns across seemingly different problems—this improves continuously as you accumulate more context and experience. The person who can see how an identity sprawl problem (Week 4) connects to vendor risk (Week 5) connects to organizational structure (Week 6) connects to compliance requirements (Week 9) is thinking systemically. That’s a skill that develops over years and applies broadly. Relationship building compounds. The trust you build with colleagues, the credibility you establish with leadership, the reputation you develop in your professional community—these accumulate over time and make future work easier. The security person who’s known for being reasonable, competent, and effective gets more organizational support than someone with identical technical skills but no relational capital. Technical skills matter. You need them. But they’re also more volatile—what’s cutting-edge today is outdated in five years. The compounding skills we’ve focused on throughout this series? Those stay valuable throughout your entire career. Knowing When to Stay, When to Move Not every organization is a good fit for every practitioner. And not every challenge is worth pushing through. But let me be clear up front: this section isn’t about quitting security. It’s about recognizing when you need to move to a different organization to continue growing in this field. The work matters. You staying in this field matters. Sometimes that means finding an environment where you can actually do the work effectively. Some situations are hard but developmental. You’re learning, building skills, making progress even if it’s slower than you’d like. The organization has constraints and you’re figuring out how to operate within them. Leadership isn’t perfect but they’re reachable. Resources are limited but you’re able to demonstrate value and make incremental improvements. You’re frustrated sometimes, but you can see a path forward. That’s worth staying for. That’s where you build the organizational literacy and political skills that compound over time. Other situations are hard in ways that aren’t developmental—they’re just dysfunctional. You’re hitting the same walls repeatedly. Security work gets killed for reasons that don’t make sense even after you understand the context. You’re excluded from decisions until it’s too late to influence them. Leadership says security matters but their actions show it doesn’t. You’re expected to accept responsibility without being given authority. Your professional opinions are solicited but never actually valued. We talked about some of these patterns in Week 6 (reporting through IT leadership). The questions to ask yourself: Are you making progress, or are you just spinning your wheels? If you’ve been trying the approaches we’ve covered—building relationships, communicating in business terms, demonstrating value, picking your battles—for a year or more and nothing is changing, that tells you something about whether the organization is actually ready to invest in security. Is the situation challenging in ways that are building your skills, or is it just draining your energy without growth? There’s a difference between “this is hard but I’m learning how to navigate it” and “this is dysfunctional and I’m just absorbing damage.” Have you had honest conversations with your management about the walls you’re hitting? Not venting, not just complaining—but explaining the specific obstacles and working together to figure out how to break them down or work around them. “Here’s what I’m trying to accomplish, here’s where I’m getting stuck, here’s what I’ve tried, what are your thoughts on how we might approach this differently?” Sometimes leadership doesn’t realize the structural barriers you’re facing. Sometimes they have context that explains why things are the way they are. Sometimes they can help remove obstacles if they understand what you need. If you haven’t had those conversations, have them before you decide the situation is unfixable. But if you have had them—repeatedly, clearly, professionally—and nothing changes, that’s information. Are the structural problems fixable, or are they fundamental to how the organization operates? Sometimes reporting structure can evolve as you demonstrate value. Sometimes organizational culture is so deeply rooted that one person can’t shift it, and trying will just burn you out. Are you asking for perfection from an organization that’s comfortable with “good enough”? This is a real question you need to sit with. We talked in Week 2 about calibrating your risk tolerance to organizational reality. Some organizations aim for security maturity. Some organizations aim for “adequate enough not to get breached, and we’ll deal with it if we do.” Neither is inherently wrong—they’re different risk appetites serving different business strategies. If you’re pushing for Fort Knox and the organization is comfortable with three locks and a guard dog, that’s a mismatch. Not necessarily because either of you is wrong, but because your expectations don’t align with their reality. That’s a conversation worth having with leadership explicitly: “What’s our actual goal here? Are we trying to be best-in-class, or are we trying to meet baseline requirements and regulatory obligations? I need to understand what success looks like for this organization so I can calibrate my work appropriately.” Are you experiencing physical symptoms from the stress? Losing sleep, anxiety that persists outside work hours, health impacts—these are signals that the situation isn’t sustainable regardless of whether you’re learning. We mentioned this briefly in Week 6, but it’s worth repeating: no job is worth destroying your health. If the organizational dysfunction is affecting you physically, something has to change. Either the situation improves, or you need to be somewhere else. The difference between “this is hard but worthwhile” and “this is damaging and I need to leave” isn’t always immediately clear. Give situations a fair chance. Use the strategies we’ve covered. Build skills and credibility. Have the hard conversations with management about what’s not working and how to improve it. Be honest with yourself about whether you’re asking for perfection or asking for basic functionality. But also be honest about whether you’re making progress or just enduring dysfunction. Moving to a different organization isn’t failure. Sometimes it’s the most professional choice you can make. The field needs you. If you’re in an environment that’s preventing you from growing or actively damaging you, finding a better fit is how you stay in this career long-term. Don’t leave security. But don’t stay in situations that are breaking you, either. The Moments That Make It Worthwhile Security work is hard. There will be days where everything you propose gets shot down. Weeks where you’re underwater on compliance requirements while actual security work gets deprioritized. Months where you’re fighting fires instead of making progress. Incidents that reveal gaps you knew existed but couldn’t get resourced to fix. But there are also moments that make all of it worthwhile. When you prevent something bad from happening—and maybe nobody else even knows. You caught the phishing campaign before it spread. You identified the misconfiguration before it was exploited. You flagged the vendor risk before it became a breach. The threat was real, you stopped it, and it never made headlines because that’s how good security works. When leadership finally understands what you’ve been saying for months. You’ve been raising a risk, documenting it in your risk register, communicating it clearly. And finally—maybe because of a public breach that demonstrates the exact scenario you described, maybe because they’ve accumulated enough context, maybe because the timing is finally right—they get it. And they fund the remediation. That validation feels incredible. When a process you built actually works during a crisis. The incident response plan you documented. The logging you fought to implement. The relationships you built across teams. The escalation paths you defined. When an actual incident happens and everything clicks into place—people know their roles, the documentation is there, the logs exist, the communication flows—that’s satisfying in ways that are hard to describe. When you look at your risk register from a year ago and realize how much you’ve actually closed. Fifty risks remediated. A hundred issues addressed. Technical debt that existed for years, finally fixed. Controls that didn’t exist, now implemented. The work felt incremental day-to-day, but looking back across a year the progress is undeniable. When a junior colleague comes to you with a problem and you can help because you’ve been there. You’ve navigated the same organizational friction. You’ve had the same conversation with that executive. You’ve solved the same technical challenge. And you can save them some of the pain you went through because someone’s finally asking the questions you wish you’d known to ask. When you see evidence that security is becoming part of how the organization thinks, not just a checklist. Development teams including you in design reviews without being told to. Business units asking about security implications before making decisions. Leadership considering security risk alongside other factors instead of treating it as an afterthought. Culture change is slow, but when you see it happening it’s incredibly rewarding. These moments don’t happen every day. Sometimes they don’t happen every month. But they do happen. And when they do, they erase a lot of the hard days. The key is recognizing them when they happen and letting yourself feel good about them. Security success is often invisible—the things that don’t happen, the crises that never occur. Train yourself to notice and acknowledge the wins, even the quiet ones. They’re what keep you going through the challenging stretches. The Long Game Security careers are long. Measured in decades, not years. Where you are right now—1-5 years in—you’re still building foundations. Learning the technologies, yes, but more importantly learning how organizations actually work. How decisions get made. How to communicate effectively. How to navigate politics without becoming political. How to make progress within constraints. This phase is about building competence and credibility. Demonstrating that you understand the business, not just the security. Showing that you can work within realistic constraints, not just advocate for ideal solutions. Establishing relationships and reputation that will make future work easier. At five years, you should have solid technical fundamentals and growing organizational literacy. You understand your environment well. You can identify risks and communicate them effectively. You can implement security controls and demonstrate their value. You’re starting to see patterns and develop intuition. You’re trusted to handle incidents competently. At ten years, you should have strong pattern recognition across multiple domains. You can assess a security program and quickly identify the high-value improvements. You understand organizational dynamics well enough to navigate them strategically. You can mentor others effectively because you’ve seen enough to articulate the lessons clearly. You’re building security programs, not just implementing controls. This isn’t about titles or hierarchy—it’s about capabilities and impact. The ten-year practitioner who’s built robust security programs in constrained environments has skills that compound indefinitely. The person who’s only chased certifications and job titles without building organizational effectiveness hasn’t grown the same way. And here’s something nobody tells you early on: the work gets more satisfying as you get better at it. Not easier—it’s still hard—but more satisfying. Because you’re making bigger impact, seeing the long-term results of work you did years ago, mentoring people who are where you used to be, solving more complex problems that require the experience you’ve accumulated. The first few years can be frustrating because you see all the gaps but you don’t have the organizational positioning or credibility to address them as fast as you want. That gap between what you know should happen and what you can actually make happen—it’s demoralizing sometimes. But as you build competence, credibility, and organizational capital, that gap narrows. Not because the problems get easier, but because you get more effective at solving them. This series covered twelve weeks of content, but it’s really about giving you frameworks for understanding the next decade. The organizational patterns, the political dynamics, the pragmatic trade-offs—these are things that typically take years of painful experience to learn. You’ll still need the experience. But hopefully you’ll recognize what you’re experiencing more quickly, make better sense of it, and extract the lessons with less pain and confusion. What’s Next (And What You Want to Hear About) I’ve built two security programs from the ground up in different-sized companies. I’ve worked for major nationwide organizations and small operations. I’ve been the solo security person and I’ve been part of larger teams. These twelve weeks covered the challenges I see most security practitioners face regardless of organization size—the foundational organizational and political realities that transcend specific industries or company stages. But there are things I didn’t cover. Some because they felt too specific or too advanced for this series. Some because they’re skills I’ve internalized to the point where I don’t think about them consciously anymore—like systematic thinking and how I can hear one scenario, build out the attack vectors in my head, and mentally scroll through defense-in-depth layers to identify gaps. (Actually, that’s a topic we didn’t talk about at all, and maybe we should have.) So here’s what I want to know: What are you facing that we didn’t cover? What topics are top of mind for you right now? What organizational challenges are you hitting that don’t fit neatly into the twelve weeks we just went through? What skills do you wish you had better frameworks for understanding? Maybe it’s how to think systematically about security architecture. Maybe it’s how to build a security culture when you’re not in a leadership position. Maybe it’s how to handle specific political dynamics we touched on but didn’t fully explore. Maybe it’s technical topics where you want the organizational context—not just “how to implement X” but “how to get X funded and adopted in a resistant organization.” Maybe it’s something I’ve never thought about because my experience is different from yours, or because I’ve been doing it so long I don’t realize it’s not obvious. I’m listening. What would be useful? One Last Thing You’re going to make mistakes. You’re going to advocate for things that don’t get funded. You’re going to miss things during incidents. You’re going to communicate risk poorly and watch decisions get made based on incomplete understanding. You’re going to implement controls that don’t work as well as you hoped. You’re going to accept trade-offs you’re uncomfortable with and occasionally rationalize things you shouldn’t. That’s normal. That’s part of learning this work. The difference between effective practitioners and struggling ones isn’t that effective practitioners don’t make mistakes—it’s that they treat mistakes as information. What didn’t work? Why? What would I do differently next time? What pattern am I seeing here that I should watch for in the future? Every failed project teaches you something about organizational dynamics. Every incident reveals gaps in your defenses or your processes. Every awkward conversation with leadership shows you what resonates and what doesn’t. Every vendor disappointment calibrates your expectations. The experience compounds if you let it. If you’re paying attention, reflecting, adjusting based on what you learn. Security work matters. It matters for the organizations that depend on it. It matters for the people whose data you’re protecting. It matters for the broader ecosystem—every organization that gets breached makes the threat landscape worse for everyone else. You’re building something worthwhile. Maybe not quickly. Maybe not perfectly. Maybe not with the resources you wish you had. But you’re building it. The twelve weeks we’ve spent together covered a lot of ground. Environmental understanding, risk management, visibility gaps, identity sprawl, vendor relationships, organizational navigation, project dynamics, executive priorities, compliance tension, pragmatic trade-offs, incident response, and learning from others’ mistakes. But really, it’s been about one thing: how to do security work effectively in the real world, with real constraints, with real people, in real organizations that are messy and imperfect and don’t match what the textbooks describe. You know more now than you did thirteen weeks ago. Not just facts—frameworks for understanding the work you’re doing. Context for the challenges you’re facing. Strategies for operating more effectively. Patterns to watch for. Pitfalls to avoid. The work is still hard. But you’re better equipped for it. I wish someone had told me this stuff earlier. I’m glad I could tell you. Now go build something. The post Week 14: What I Wish Someone Had Told Me appeared first on Cultivating Security.
-
19
Week 13: Learning from Incidents You Didn’t Have
The security community has a gift that we don’t use effectively enough: every major breach becomes public eventually. Companies have to disclose incidents. Researchers analyze and publish findings. Post-mortems get written. We can learn from other organizations’ failures without having to experience them ourselves. But most people don’t extract meaningful lessons from public breaches. They read the headlines, maybe feel a moment of “glad that wasn’t us,” and move on. Or they read the technical details but don’t connect them to their own environment. That’s a missed opportunity. Because the patterns that lead to breaches are often similar across different organizations. The attack techniques that work against one target often work against others. The organizational and cultural failures that allowed an incident to happen probably exist in your organization too. Learning to read public breaches for useful lessons—and applying those lessons to your own environment—is a skill that takes practice. But it’s valuable because it helps you build pattern recognition and intuition without having to learn everything through painful personal experience. What to Look for in Public Breaches When you read about a breach, the headline usually tells you what happened: “Company X suffered data breach affecting Y million customers.” That’s not the useful part. The useful part is understanding how it happened and why. What weaknesses existed that allowed the breach? What organizational or process failures contributed? What could have prevented it or detected it earlier? Not all of this information is available. Companies don’t always release detailed post-mortems. But often there’s enough information available—from the company’s disclosure, from researchers who analyzed the incident, from forensic reports if they’re public—to understand the key factors. Initial access vector. How did the attacker get in? Phishing? Vulnerability in internet-facing system? Compromised credentials? Third-party compromise? This tells you what defenses failed or were absent. Privilege escalation and lateral movement. Once inside, how did the attacker expand their access? Did they find unpatched vulnerabilities? Exploit weak access controls? Find credentials stored insecurely? This tells you what internal controls failed. Dwell time. How long was the attacker present before detection? Days? Weeks? Months? This tells you something about detection capabilities—or lack thereof. What finally triggered detection? Was it internal monitoring? External notification from law enforcement or a third party? A ransom demand? This tells you whether detection worked or the organization got lucky. Data accessed or exfiltrated. What did the attacker actually get? How was it protected (or not)? This tells you about data security practices. Response and remediation. How did the organization respond? How long did containment and recovery take? What mistakes were made? This tells you about incident response maturity. The Pattern Recognition Skill After you’ve read about enough breaches, you start seeing patterns. Certain attack paths are common. Phishing to initial access, credential theft, lateral movement through weak internal controls, eventual access to high-value systems or data. This pattern repeats across different industries and organization types because it works. Certain organizational weaknesses are common. Poor asset inventory leading to unknown or forgotten systems. Inadequate logging making investigation difficult. Over-privileged access enabling lateral movement. Lack of segmentation allowing attackers to reach sensitive systems once they’re inside. Certain cultural or process failures are common. Security updates that don’t get applied because of operational concerns. Security tools that exist but aren’t properly configured or monitored. Security processes that exist on paper but aren’t followed in practice. When you recognize these patterns, you can evaluate whether they exist in your own environment. Not “could we get breached the exact same way” but “do we have the same types of weaknesses that contributed to that breach?” This is more valuable than trying to defend against specific attack techniques. Attack techniques evolve. But organizational weaknesses tend to persist. Translating to Your Environment The question to ask when reading about a breach isn’t “could this exact attack work against us” but “what similar weaknesses do we have?” If a breach happened because of an unpatched internet-facing system: Do we have good visibility into our internet-facing attack surface? Do we have a reliable patching process? Do we know when new systems get exposed to the internet? If a breach happened because of over-privileged service accounts: Do we know what service accounts exist? Do they have more access than necessary? Have we reviewed them recently? If a breach happened because logging wasn’t retained long enough to understand the full scope: How long do we retain logs? Is that adequate for investigation? Do we have gaps in what we log? If a breach happened because a third-party vendor was compromised: How do we assess third-party risk? Do we have visibility into what access third parties have? Do we monitor that access? This translation from “what happened to them” to “what does this mean for us” is where the learning actually happens. What Doesn’t Apply Not every breach lesson is relevant to every organization. If a breach happened because of a weakness in a specific technology or product you don’t use, the specific technical details might not matter to you. But the category of weakness might still be relevant. If a breach happened in a highly regulated industry with requirements that don’t apply to you, some of the lessons might not translate. But organizational and process failures often do translate even across different regulatory environments. If a breach happened at a massive scale and you’re a much smaller organization, some of the systemic issues might not apply. But the fundamental weaknesses often do. The judgment call is distinguishing between lessons that apply broadly versus lessons that are specific to circumstances you don’t share. This requires understanding your own environment well enough to make that judgment. If you don’t know your architecture, your access patterns, your third-party relationships—you can’t evaluate whether a particular breach lesson is relevant. Avoiding Threat Inflation There’s a risk in reading about breaches: everything starts to look like an emergency. “This sophisticated attack campaign targeted our industry. We need to immediately implement defenses against it.” Maybe. Or maybe this is an advanced persistent threat that you’re not actually likely to face, and there are more realistic threats you should be focusing on. Reading about sophisticated attacks is interesting. It’s good to understand what’s possible. But it shouldn’t drive your priorities unless you have specific reason to believe you’re a likely target for that threat. Most organizations get breached through common attack paths, not sophisticated novel techniques. Phishing. Unpatched vulnerabilities. Weak credentials. Misconfigurations. These are the things that actually happen frequently. Sophisticated nation-state attacks make headlines. They’re not what most organizations need to optimize their defenses against. So when you’re learning from public breaches, pay attention to the common patterns, not just the exotic ones. The boring failures that happen repeatedly are more likely to be relevant than the once-in-a-decade sophisticated campaign. The Supply Chain Lesson One pattern that’s become increasingly important: third-party compromise as an attack vector. Organizations get breached through their vendors. Through their software supply chain. Through their business partners. The attacker compromises an organization that has trusted access to the real target, then uses that access to pivot. This is hard to defend against because you don’t fully control the security practices of third parties. But you can at least be aware of the risk and take some mitigation steps. Understand what third parties have access to your environment. What data, what systems, what permissions. Limit that access to what’s actually necessary. Monitor it for anomalies. Assess third-party security practices as best you can. Due diligence, questionnaires, certifications—these aren’t perfect but they’re better than nothing. Have contingency plans for what happens if a critical third party gets compromised. Can you disable their access quickly? Can you operate without them temporarily if necessary? Supply chain risk is one of those lessons that keeps appearing in breach post-mortems. If you’re not thinking about it, you should be. The Detection Gap A common theme in breach post-mortems: the attacker was present for a long time before detection. Sometimes this is because the organization had no detection capabilities at all. More often, it’s because they had detection tools but those tools weren’t configured effectively, weren’t being monitored, or weren’t tuned to detect the specific activity that was happening. The lesson isn’t “buy better detection tools.” It’s “make sure the tools you have are actually useful.” Are you collecting the logs that would reveal common attack techniques? Are those logs being analyzed, or just stored? If you’re generating alerts, is anyone actually responding to them or have they become noise? Detection is only valuable if it actually detects things and if you respond when it does. Having expensive security tools that aren’t properly configured or monitored is security theater. This is one of those lessons that appears over and over. Organizations that got breached often had tools that could have detected the attack if they’d been properly implemented and used. The failure wasn’t technology—it was implementation and process. The Organizational Culture Patterns Some breach post-mortems reveal organizational culture issues that contributed to the incident. Security teams that raised concerns but weren’t listened to. Security processes that existed on paper but were routinely bypassed because they were inconvenient. Security tools that were deployed to check a compliance box but never actually used. These are harder lessons to apply because culture change is hard. But they’re important because they reveal that technical controls are only part of security. Organizational culture and process discipline matter just as much. If your organization routinely prioritizes speed over security, bypasses security reviews, or treats security as an annoying checklist rather than a real concern—you have cultural risk that no amount of technical controls fully addresses. Reading about breaches that happened partly because of cultural failures should prompt honest reflection about your own organization’s culture. The Hindsight Bias Trap When reading about a breach after the fact, it’s easy to think “how did they not see this coming?” Everything looks obvious in hindsight. The warning signs that were missed, the vulnerabilities that should have been patched, the access that should have been revoked. But in real-time, with competing priorities and incomplete information and resource constraints, those decisions probably seemed reasonable. Or at least understandable. This doesn’t mean the decisions were right. But it means you should be humble about judging them, because you’re probably making similar trade-offs in your own environment. The question isn’t “how were they so stupid” but “what similar trade-offs are we making that might look obvious in hindsight if we get breached?” That’s uncomfortable to think about. But it’s more useful than smugness. Putting It Into Practice The framework I’ve described works best when you see it applied to actual incidents. I write detailed breach analyses at cultivatingsecurity.com/category/analysis that walk through this exact process—taking public breach disclosures and extracting actionable lessons for your environment. For example, my analysis of the Marquis Software breach examines how a 40-year-old vendor serving 700+ financial institutions appears to have lacked basic security controls like MFA on VPN accounts, adequate logging, and EDR deployment. The piece walks through: How the attack unfolded and why it took 74 days for Marquis to notify the financial institutions (their direct customers), then 104 days to notify the actual individuals whose data was compromised What the post-breach remediation reveals about control gaps that existed beforehand Why standard vendor due diligence failed to identify these issues How to translate those patterns to your own vendor risk management That’s the level of detail needed to truly extract lessons—more than we can cover in this post. If you want to see the framework in action with specific breach examples, those analyses demonstrate exactly how to move from “here’s what happened” to “here’s what it means for you.” Building Intuition The real value of learning from public breaches is building intuition over time. You start to recognize patterns. You develop a sense for what types of weaknesses are common and consequential. You build mental models of how attacks actually unfold in real environments. This intuition helps you prioritize. It helps you identify risks that matter. It helps you avoid getting distracted by exotic threats that are unlikely to affect you. It also helps you communicate risk more effectively. “Here’s a recent breach that happened because of the same type of weakness we have” is more compelling than abstract risk discussions. But building this intuition requires consistently reading about breaches and thinking critically about what they mean. Not just reading headlines—actually understanding what happened and why. Practical Takeaways Every public breach is a learning opportunity. Most people don’t extract the useful lessons from them. Look for how and why breaches happened, not just what happened. Initial access, lateral movement, detection failures, organizational weaknesses. Recognize patterns across multiple breaches. Common attack paths, common organizational failures, common cultural issues. Translate lessons to your own environment. Not “could this exact attack work” but “do we have similar weaknesses.” Distinguish between lessons that apply broadly and lessons specific to circumstances you don’t share. Avoid threat inflation. Focus on common attack patterns, not exotic sophisticated techniques unless you have reason to believe you’re a target. Pay attention to supply chain risk patterns. Third-party compromise is increasingly common. Detection failures are a recurring theme. Having tools isn’t enough—they need to be configured and monitored effectively. Cultural patterns contribute to breaches. Security processes that exist on paper but aren’t followed in practice create risk. Avoid hindsight bias. Decisions that look obvious afterward were made with incomplete information and competing priorities. Build intuition over time by consistently learning from public incidents. This helps with prioritization and risk communication. Read breach post-mortems not to feel smug but to understand what similar risks exist in your environment and how to address them. The post Week 13: Learning from Incidents You Didn’t Have appeared first on Cultivating Security.
-
18
Week 12: Incident Response Is Half Politics
You’ve planned for incidents. You have a documented incident response plan. You’ve done tabletop exercises. Your team knows their roles. You have runbooks for common scenarios. Then an actual incident happens, and you discover that the plan didn’t account for half of what actually matters. Because incident response isn’t just technical. It’s organizational, political, and human. You’re not just trying to contain and remediate a security issue—you’re managing executive panic, communicating with stakeholders who don’t understand security, making decisions with incomplete information under time pressure, and documenting everything for the inevitable post-incident review. The technical part is hard. The organizational part is often harder. And if you’re not prepared for both, you’re going to struggle even if your technical response is solid. What Actually Happens During Incidents Your incident response plan probably has clean steps: detect, contain, eradicate, recover, lessons learned. Real incidents are messier. You detect something that might be an incident or might be normal but anomalous activity. You don’t know which yet. You need to investigate without making assumptions. You start investigating and realize you don’t have the logs you need. Or the logs you have don’t go back far enough. Or the thing you’re investigating happened in a system you don’t have good visibility into. You think you’ve contained it, but then you find evidence that the attacker had access earlier than you thought. Or broader than you thought. So now your containment boundary was wrong and you have to expand it. You’re trying to eradicate the threat, but you’re not entirely sure you’ve found all the persistence mechanisms. How long do you search before you’re confident enough to say it’s gone? You’re trying to recover, but business stakeholders are pressuring you to restore systems quickly, and you’re trying to balance speed against the risk that you haven’t fully remediated. None of this is clean. All of it involves judgment calls with incomplete information. And all of it is happening while people are watching and asking questions and wanting answers you don’t have yet. Managing Executive Attention Executives care when there’s an incident. Suddenly you have attention from people who normally aren’t involved in security operations. This is both helpful and challenging. Helpful because you might get resources you wouldn’t normally get. Authority to make decisions quickly. Budget for emergency response. Organizational cooperation that would usually take weeks to coordinate. Challenging because executives want answers and certainty, and you often don’t have those yet. They want to know: What happened? How bad is it? When will it be fixed? Are we going to have to notify customers? What’s this going to cost? And your honest answers are often: We don’t know yet. We’re still investigating. It could be anywhere from minor to severe. We can’t estimate time to resolution until we understand the full scope. We’ll know about notification requirements when we know what data was accessed. That’s not satisfying. But it’s honest. And giving false certainty is worse than admitting uncertainty. What helps: Regular updates. Even if you don’t have new information, update stakeholders on what you’re doing. “We’re still analyzing logs from the authentication system. We’ve ruled out X, we’re investigating Y, we expect to have more information in two hours.” Translate technical findings into business impact. Don’t just say “we found lateral movement.” Say “the attacker accessed multiple systems, including ones that contain customer data. We’re working to determine what specific data was accessed.” Set expectations about timelines. If investigation is going to take days, say so. Don’t let executives think this will be resolved in hours just because you don’t want to give bad news. Be honest about what you don’t know. “We don’t know yet” is a legitimate answer. It’s better than speculating or giving false assurance. Have a single point of contact for executive communication. Multiple people giving updates creates confusion and inconsistent messaging. Designate one person to communicate with leadership. The Notification Decision One of the most fraught decisions during an incident is whether you’re required to notify customers, regulators, or the public. This isn’t just a security decision—it’s a legal and business decision. And it needs to be made carefully, with input from legal counsel. But security has to provide the information that drives that decision. What data was accessed? How many people are affected? What’s the evidence for and against data exfiltration? The pressure is to minimize. “We don’t have evidence that data was exfiltrated, so maybe we don’t need to notify.” But absence of evidence isn’t evidence of absence. If the attacker had access and you don’t have comprehensive logging, you might not have evidence even if exfiltration occurred. The conservative approach is to assume the worst case unless you have evidence otherwise. If the attacker had access to customer data and you can’t definitively rule out exfiltration, you probably have to notify. This creates tension with business stakeholders who want to avoid notification because of the cost and reputational damage. Your job is to provide accurate information about what you know and don’t know, and let legal and executive leadership make the decision. But you have to be clear about the uncertainty. If you say “we don’t think data was exfiltrated” and they decide not to notify based on that, and then you later find evidence that it was—that’s a problem. Be precise about what you know, what you don’t know, and what the evidence supports. Documentation Under Pressure You’re supposed to document everything during an incident. Timelines of actions taken, decisions made, evidence collected. This is critical for post-incident analysis and potential legal or regulatory proceedings. In practice, when you’re in the middle of an active incident and everyone’s working frantically, documentation often slips. People forget to log what they did. Decisions get made verbally and nobody writes them down. Evidence gets collected but the chain of custody isn’t properly documented. This is understandable but problematic. After the fact, when you’re trying to reconstruct what happened, incomplete documentation makes that much harder. What helps: Designate someone as scribe. One person whose job during the incident is to document what’s happening. Not doing technical work—just capturing the timeline, decisions, and actions. Here’s a recommendation: if your organization is big enough and the incident grows beyond initial response, get an executive admin or a business analyst from the PMO to help with this. If you force one of your technical team members to be the scribe, they’ll resent being pulled off technical work when their skills are needed elsewhere. But someone who’s good at taking notes and asking clarifying questions can be invaluable here. You’re probably already hours or even days into the incident before you realize you need dedicated documentation support. Once you get that person, take an hour or two to backfill. Go over what happened in the last few hours or days and reconstruct the timeline together. It takes time, but it’s worth it—especially if there’s eventual legal or regulatory scrutiny. Use a shared document or chat channel for incident updates. Something where everything is automatically logged and timestamped. This creates a timeline even if nobody’s actively maintaining documentation. Document decisions with rationale. Not just “we decided to isolate the server” but “we decided to isolate the server because we found evidence of data exfiltration and needed to prevent continued unauthorized access.” Preserve evidence properly. If you’re collecting logs or taking disk images or capturing memory dumps, document chain of custody. This matters if there’s ever legal action. Don’t destroy evidence accidentally. Rebuilding a compromised system cleans up the evidence of how it was compromised. Make sure you’ve collected everything you might need before you wipe and rebuild. The Communication Challenge You’re going to be communicating with different audiences who need different information. Technical team: Detailed technical information. IOCs, attack techniques, affected systems, remediation steps. They need enough detail to do their jobs. Executive leadership: Business impact. What systems are affected, what’s the impact to operations, what’s the potential for customer or regulatory notification, what resources are needed, what’s the timeline. Legal counsel: What data was potentially accessed, what evidence you have, what gaps in visibility exist, what regulatory requirements might apply. Affected users or customers (if notification is required): What happened, what data was potentially affected, what you’re doing about it, what they should do, how they can get more information. Each audience needs different levels of detail and different framing. Explaining attack techniques to executives wastes time. Giving customers vague reassurances without specific information frustrates them. Tailor your communication to the audience. And make sure the messages are consistent—you can’t tell executives one thing and customers something contradictory. The Blame Dynamic When something bad happens, people want to know whose fault it is. This is often counterproductive during incident response. Yes, maybe someone clicked a phishing link. Maybe someone misconfigured a system. Maybe someone disabled a security control that would have prevented this. But during active response, blame doesn’t help. It makes people defensive. It makes them less likely to come forward with information. It creates an environment where people are more worried about protecting themselves than solving the problem. And here’s a critical reason to avoid premature blame: you often don’t have the full picture yet. I’ve worked an incident where we detected two or three credentials being used regularly during the attack. The initial reaction from some stakeholders was to identify and confront those users. But we held off. Through investigation, we were able to confirm that two of those people had their passwords compromised—keylogger, credential stuffing from a breach, something along those lines. They weren’t involved; their credentials were just stolen and used by the attacker. If we’d blamed those people early and pushed for immediate termination, we could have gotten innocent people fired. One of the accounts we could never definitively determine—whether it was willing participation or another compromised credential. My gut says compromised, but we couldn’t prove it the same way we did with the others. Point is: during an active incident, you don’t always know who did what or whether apparent insider activity is actually an insider or just stolen credentials. Making it about blame before you have facts creates injustice and destroys trust. Save the accountability discussion for after the incident is resolved. During the incident, focus on fixing the problem. This requires discipline from leadership. If executives start demanding to know who’s responsible while the incident is still active, that needs to be redirected. “We’ll do a full post-incident review to understand what happened and how to prevent it in the future. Right now we need everyone focused on response.” Blameless post-mortems are a cultural practice worth adopting. Understand what happened, what contributed to it, what can be learned, how to prevent it in the future—without making it about punishing individuals. This creates an environment where people are more honest about mistakes and near-misses, which makes the organization more resilient. When the Plan Doesn’t Fit Your incident response plan probably covers common scenarios. Malware infection. Phishing compromise. DDoS attack. Unauthorized access. Then you get an incident that doesn’t fit any of those patterns. Or fits multiple patterns. Or involves systems or attack techniques your plan didn’t anticipate. Here’s a structural recommendation: you need an overarching incident response framework—the generic process that applies to any incident—and then specific playbooks underneath it for common scenarios. The framework covers the principles: detect, contain, investigate, eradicate, recover, document. The decision-making process. The communication structure. The escalation paths. The playbooks cover specific scenarios: “user clicked phishing link,” “DDoS in progress,” “ransomware detected.” Step-by-step guidance for that particular situation. But here’s the problem with overly prescriptive plans: real incidents don’t stay in neat categories. You might have an incident that involves phishing, credential compromise, and malware. Which playbook are you following? All of them? And if you try to put every possible notification scenario and every regulatory obligation into a single incident response plan, you end up with a 200-page document that nobody will actually use during a crisis. So keep the framework generic enough to be useful regardless of the specific incident type. Use playbooks for common patterns but understand they’re guidance, not rigid scripts. The plan is a starting point, not a script. You still have to adapt to what you’re actually seeing. This is where judgment and experience matter. Understanding principles (contain the threat, collect evidence, minimize impact) rather than just following procedures. Being able to make decisions when the playbook doesn’t give you an answer. And being willing to escalate when you’re out of your depth. If the incident involves sophisticated techniques you don’t have experience with, bring in help. That might be external incident response consultants. That might be specialists from vendors. That might be law enforcement if there are criminal implications. One important note about law enforcement: they’re not there to do your forensics or incident response. If someone committed a crime, they’ll build a case—but only if they believe they can prosecute. Their priorities and timelines are different from yours. They can be valuable partners, but don’t assume they’ll solve your incident for you. You still need your own response capability. Knowing when you need help is itself a valuable skill. The Recovery Pressure During an incident, there’s pressure to restore normal operations as quickly as possible. Every hour that systems are down costs the business money. Users can’t do their jobs. Customers can’t access services. This creates tension with thorough remediation. To be confident you’ve removed the threat, you need time to investigate, clean compromised systems, verify that persistence mechanisms are gone. Rushing this means potentially missing something and having the attacker return. But business stakeholders want systems back up. They want to know why it’s taking so long. They’re weighing the cost of continued downtime against the risk of incomplete remediation. Sometimes the right answer is strategic shutdown—taking systems offline deliberately to enable proper containment. Early in my career, I was working on a resource management team—basically server admins for a divisional office. We were fighting a worm—I can’t remember the exact name 25+ years later, but I remember it was incredibly annoying. We’d clean one system, move to the next, then the next—and before we could finish, the first system would be reinfected. Cat and mouse. Whack-a-mole. Finally I came up with an idea: “Boss, I’m taking down the network for 45 minutes.” “WHAT? NO!” “We can’t get ahead of this worm. If we take the ring down”—yes, I’m old, it was a Token Ring network—”the worm can’t move while we eradicate it. We have a very efficient cleanup process. The problem is the worm moves during the process.” We took the network down. Cleaned every system systematically while the worm couldn’t propagate. Brought it back up clean. Problem solved. The lesson: sometimes you need to create the conditions for successful remediation, even if that means deliberate downtime. But sometimes the right answer is strategic patience—not shutting things down immediately so you can ensure you’ve found everything. Years later, I was working for a retailer. We’d been in incident response for weeks after getting alerts from the card brands about compromised payment cards. We finally found something—confirmed the compromise, started notifications, and identified a system that was actively exfiltrating data. We repositioned sensors and monitoring to watch it. During an update call, an executive demanded to know why we hadn’t shut the system down immediately. I explained that while we’d found one command-and-control server, we couldn’t prove a second one didn’t exist. At that point we’d already lost tens of thousands of cards. Another day with maybe 5,000 more cards exposed wasn’t going to fundamentally change the impact, but it could help us verify we’d found everything. The executive essentially kicked and screamed to shut it down now. But we held the line. We wanted to watch the next exfiltration—what the attacker touched, what commands they issued, just to be certain we had the full picture. It paid off. During the next data exfiltration, the attacker sent a ping to a system we hadn’t suspected. We grabbed a forensic image, quickly analyzed it, and verified it was a silent secondary C2 server that we would have missed if we’d shut down the first one immediately. Then we took both systems offline simultaneously and cut off the attacker’s access completely. We monitored for three days. Not one reconnection attempt. Not one similar pattern. Clean containment. If we’d shut down the first C2 when the executive demanded, the attacker would have still had access through the second one. We’d have thought we were contained, restored operations, and the breach would have continued. The lesson: sometimes you need patience to ensure complete containment, even when stakeholders are demanding immediate action. Your job is to be clear about the trade-offs. “We can restore this system now, but we haven’t fully verified that all malware is removed. If we restore it and the attacker still has access, we might be back in the same situation.” versus “We need another six hours to complete analysis and be confident in the remediation.” Sometimes leadership will accept the risk of faster restoration. That’s their call if they understand what they’re accepting. But they need to understand it clearly—and sometimes you need to make the strategic call, whether that’s taking systems down to enable cleanup or keeping them running to ensure you’ve found everything. The Post-Incident Review After the incident is resolved, you need to do a proper post-incident review. What happened? How was it detected? How long did response take? What worked well? What didn’t work? What would we do differently? What changes do we need to make to prevent similar incidents or respond better next time? This is where you capture lessons learned and turn them into improvements. It’s also where you document the incident fully for future reference. Be honest in this review. If something didn’t work, say so. If someone made a mistake that contributed to the incident, document that without making it personal. If you got lucky and the impact could have been worse, acknowledge that. The goal is learning, not blame. The goal is making the organization more resilient, not making people feel bad about what went wrong. And actually implement the improvements that come out of the review. Too many post-incident reviews result in great recommendations that never get acted on. If you’re going to take the time to document lessons learned, follow through on them. Practical Takeaways Incident response is organizational and political, not just technical. Plan for both. Real incidents are messier than tabletop exercises. You’ll make decisions with incomplete information under time pressure. Manage executive communication carefully. Regular updates, translate technical to business impact, be honest about uncertainty. Notification decisions are legal and business decisions. Provide accurate information about what you know and don’t know. Document everything during the incident. Designate a scribe, use shared timelines, document decisions with rationale. Tailor communication to different audiences. Technical detail for responders, business impact for executives, clear information for affected parties. Avoid blame during active response. Save accountability discussions for post-incident review. Plans are starting points, not scripts. Be prepared to adapt to incidents that don’t fit the playbook. Balance recovery pressure with thorough remediation. Be clear about trade-offs and risks. Do proper post-incident reviews and actually implement the improvements. Turn incidents into learning opportunities. The post Week 12: Incident Response Is Half Politics appeared first on Cultivating Security.
-
17
Week 11: When ‘Best Practices’ Don’t Apply
Every security framework, every certification course, every vendor white paper tells you what you should do. Implement least privilege. Segment your network. Patch within 30 days. Enforce MFA everywhere. Use zero trust architecture. All of this is good advice. In theory. In practice, you’re working in an environment with legacy systems that can’t be easily changed, technical debt that accumulated over years, resource constraints that limit what’s actually achievable, and business requirements that sometimes conflict with security best practices. So you’re left figuring out: when do I insist on the textbook approach, and when do I accept that we need a different solution that’s good enough given our constraints? This is where judgment matters. Where experience matters. Where understanding the difference between “this is suboptimal but acceptable” and “this is actually dangerous and we can’t accept it” makes the difference between being effective and being either rigid or reckless. The Legacy System Problem You have a legacy application that’s critical to the business. It runs on an operating system that’s no longer supported. It can’t be upgraded because the vendor doesn’t support newer OS versions. It can’t be replaced because it would cost millions and take years. Best practice says: don’t run unsupported operating systems. They don’t get security patches. Every vulnerability that gets discovered remains unpatched forever. Reality says: this system is running business-critical processes and it’s not going away anytime soon. So what do you do? You can’t magically make the application work on a supported OS. You can’t wave a wand and get budget for a multi-million dollar replacement project. You can’t just turn it off because the business depends on it. What you can do is implement compensating controls. Segment it so it’s not directly accessible from the internet or the general corporate network. Monitor it closely. Restrict access to only the people and systems that absolutely need it. Put additional layers of defense around it. Accept that the system itself is vulnerable, but reduce the likelihood and impact of that vulnerability being exploited. Is this ideal? No. Is it acceptable given the constraints? Sometimes yes. The judgment call is whether the compensating controls are sufficient to reduce the risk to an acceptable level. Sometimes they are. Sometimes they’re not, and you need to escalate and push for the replacement project even though it’s expensive and difficult. The Technical Debt Trap Technical debt accumulates. Applications get built with hard-coded credentials because that was expedient at the time. Service accounts get created with overly broad permissions because figuring out the minimum necessary access was time-consuming. Integrations get implemented in ways that work but aren’t secure because the deadline was tight. Best practice says: fix all of it. Implement proper secrets management. Enforce least privilege. Rebuild integrations properly. Reality says: you have finite resources and fixing all the technical debt would take years of dedicated effort that you don’t have bandwidth for. So you prioritize. What technical debt creates the most risk? What’s easiest to fix relative to the risk reduction? What can be addressed incrementally versus what requires a big-bang fix? You might decide that hard-coded credentials in production applications are unacceptable and need to be fixed even if it’s difficult. But hard-coded credentials in rarely-used internal tools are lower priority and can wait until you have time. You might decide that overprivileged service accounts with access to production databases get fixed first. Overprivileged accounts in development environments get fixed eventually but not immediately. This is triage. You’re making trade-offs based on realistic assessment of risk versus effort. Not because you don’t care about the other technical debt, but because you can’t fix everything at once and you need to focus on what matters most. The Resource Constraint Reality Best practices assume you have adequate resources. Budget for tools. Staff to implement and maintain controls. Organizational capacity for change. Leadership buy-in and support. Most organizations don’t have adequate resources. You have to work with what you’ve got. Maybe you’d like to implement a full SIEM with a security operations center. But you have budget for a basic logging solution and no headcount for analysts. So you implement what you can afford, automate what can be automated, and accept that your detection capabilities are limited. Maybe you’d like to have dedicated security engineers embedded in development teams. But you have three security people for the entire organization. So you build security champions in the dev teams, provide guidance and tools, and accept that you can’t review everything. Maybe you’d like to implement comprehensive security awareness training with simulations and role-based content. But you have budget for an annual basic training module. So you focus on the highest-risk behaviors and supplement with targeted communications about active threats. Maybe you’d like to enforce stronger access controls across legacy systems. But leadership doesn’t see it as a priority and won’t support the organizational change required. So you focus on the highest-risk systems where you can make the case, document the gaps in the rest, and work incrementally toward broader coverage when you can build more support. None of this is ideal. But it’s making realistic trade-offs based on actual constraints. The mistake would be doing nothing because you can’t do everything. Partial implementation of security controls is still better than no implementation. The Business Requirement Conflict Sometimes business requirements genuinely conflict with security best practices. The business needs to share data with partners who have weaker security practices than you’d like. Best practice would be to only share with partners who meet your security standards. Business reality is that you don’t always get to choose your partners—sometimes the business relationship is critical and you have to work with what you’ve got. The business needs to enable a workflow that requires more privileged access than you’d ideally grant. Best practice would be to redesign the workflow. Business reality is that redesigning the workflow would affect revenue-generating processes and isn’t happening. The business needs to deploy a new feature on a tight timeline that doesn’t allow for complete security review. Best practice would be to never deploy without thorough security assessment. Business reality is that missing the market window has costs too. In these situations, your job isn’t to just say no. It’s to understand the business requirement, assess the risk it creates, and figure out what mitigations are possible given the constraints. Maybe you can’t redesign the partner integration, but you can limit what data is shared and monitor the integration closely. Maybe you can’t change the privileged access requirement, but you can add additional logging and alerting. Maybe you can’t delay the feature launch, but you can implement basic security controls now and plan for improvements in the next release. You’re not accepting risk blindly. You’re making informed trade-offs with appropriate mitigations. The “Good Enough” Threshold How do you know when something is good enough versus when it’s unacceptably risky? There’s no formula. It’s judgment based on understanding the specific risk, the specific environment, and the specific constraints. Some factors that matter: Exposure. Is this accessible from the internet, or is it internal-only? Is it in a DMZ, or is it on the general corporate network? Exposure level changes the risk calculation significantly. Data sensitivity. Does this system handle customer PII, financial data, health information? Or is it internal operational data that’s not particularly sensitive? Risk to sensitive data raises the bar for what’s acceptable. Likelihood of exploitation. Is this a known, actively exploited vulnerability? Or is it a theoretical weakness that would be difficult to exploit in practice? Active threats raise urgency. Compensating controls. What other layers of defense exist? If this control is weak but there are multiple other controls that would prevent the same attack, that’s different from this being a single point of failure. Cost and complexity of improvement. Is there a straightforward fix, or would proper remediation require major architectural changes? Sometimes “good enough” is what’s achievable, and perfect is years away. Organizational risk tolerance. Different organizations have different appetites for risk based on industry, regulatory environment, and business model. What’s acceptable in a startup is different from what’s acceptable in a bank. The judgment call is weighing all of these factors and deciding whether the current state is acceptable or whether it needs to be escalated and addressed despite the difficulty. When to Insist on Best Practice There are situations where you shouldn’t compromise. Cryptography. Don’t accept weak encryption because it’s easier to implement. Don’t accept custom cryptography because someone thought they could do better than standard algorithms. This is an area where best practices should be followed strictly because the consequences of getting it wrong are severe and the expertise required to do it correctly is specialized. Authentication to critical systems. MFA for administrative access to production systems, financial systems, systems containing sensitive data—this is non-negotiable. The risk of credential compromise is too high and the mitigation is well-understood and achievable. Critical vulnerabilities in internet-facing systems. If there’s a known, actively exploited vulnerability in a system that’s directly accessible from the internet, that needs to be fixed. Not eventually—now. The risk is too high to accept even temporarily in most cases. Compliance requirements. If something is required for regulatory compliance and there’s no waiver or alternative, you have to do it. The consequences of non-compliance are not acceptable. Obvious security debt in new projects. If you’re building something new, build it right. Don’t accept hard-coded credentials or missing authentication or SQL injection vulnerabilities in new code. Technical debt in legacy systems is a reality you inherit. Technical debt in new systems is a choice. The common thread is: where the risk is high, where the remediation is achievable, where there’s no legitimate reason not to do it properly—insist on best practice. When to Accept Trade-offs There are also situations where accepting something less than ideal is reasonable. Legacy systems with compensating controls. If the system can’t be fixed immediately but the risk can be mitigated with other layers of defense, that’s often acceptable. Low-risk systems with low-priority findings. Not every vulnerability needs immediate remediation. Low-severity findings in low-risk systems can be scheduled for when resources are available. Partial implementation while full implementation is in progress. If you’re rolling out MFA but it takes time to implement everywhere, having it on the most critical systems first and expanding coverage over time is reasonable. Business-critical processes that can’t be interrupted. If proper remediation requires downtime during a critical business period, sometimes you accept the risk short-term and schedule the work for a maintenance window. Resource-constrained environments doing the best they can. If an organization genuinely doesn’t have the resources to implement everything properly, focusing on the highest-risk areas and accepting gaps in lower-risk areas is pragmatic. The key is being honest about what you’re accepting and why. Documenting it. Making sure decision-makers understand the risk. And having a plan for improvement even if it’s not immediate. The Communication Challenge When you’re accepting something that’s not best practice, you need to communicate that clearly. Not: “This is fine.” But: “This is not ideal. Here’s the risk. Here’s why we can’t fix it immediately. Here’s what we’re doing to mitigate the risk in the meantime. Here’s the plan for proper remediation.” That transparency is important. It makes sure people understand what they’re accepting. It documents your professional opinion. It shows you’re being realistic, not just rubber-stamping everything. It also positions you as someone who understands constraints and works within them, rather than someone who just says no to everything that’s not textbook perfect. Avoiding Rationalization The danger in accepting trade-offs is that it can become a slippery slope. Every deviation from best practice comes with a rationale. Eventually you’re accepting things that really aren’t acceptable, and you’ve rationalized it as pragmatic. The check against this is periodic review. Are the temporary mitigations actually temporary, or have they become permanent? Are the compensating controls still in place and effective, or have they degraded? Are the plans for eventual remediation actually moving forward, or have they been indefinitely delayed? If “temporary” means “indefinite” and “we’ll fix it later” means “we’ll never fix it,” then you’re not making pragmatic trade-offs—you’re accepting poor security and calling it realistic. Be honest with yourself about this. Accepting imperfection within a clear improvement plan is pragmatic. Accepting imperfection with no intention of improvement is just accepting poor security. Building Toward Better Even when you’re accepting trade-offs, you should be working toward improvement. That means documenting what’s not ideal and why. Maintaining a list of technical debt and security gaps. Having a plan—even if it’s a multi-year plan—for addressing them. Put this in your risk register. Document each accepted risk with the reasoning, the compensating controls, and the plan for eventual remediation. This helps you prioritize—you can focus on the riskiest items first, but you can also identify the quick wins: lower-cost fixes that mostly need human time rather than budget. And here’s an important signal to watch: the trend over time and the severity distribution. If you’re early in your security program and doing discovery, your risk register will grow—that’s expected. You’re finding historical issues that have been there all along. But if you’re in steady-state operations and your risk register keeps growing quarter over quarter, especially with high or critical severity items, that tells you something. You’re not making pragmatic trade-offs anymore—you’re falling further behind. New risks are being introduced faster than you can remediate existing ones. Similarly, if your risk register has 500 items but they’re mostly low severity with compensating controls, that’s a different situation than 50 items that are all high severity with inadequate mitigations. That’s information leadership needs to see. A growing count of high-severity accepted risks becomes evidence that current resource levels aren’t adequate for maintaining reasonable security posture. Beyond tracking the risk register, your focus should be on forward movement: It means making incremental progress. Even if you can’t fix everything, fixing the worst things makes the overall posture better. It means building security into new projects properly so you’re not accumulating more debt. The existing debt might be a reality you inherit, but at least you’re not making it worse. And it means advocating for the resources to do things properly. If you’re constantly accepting trade-offs because you don’t have adequate resources, that’s information leadership needs to hear. They might not fund everything you ask for, but they should understand the gap between current state and adequate security. Practical Takeaways Best practices are guidance, not absolute rules. They assume conditions that don’t always exist. Legacy systems and technical debt are realities. Focus on compensating controls when immediate remediation isn’t feasible. Resource constraints are real. Prioritize based on risk versus effort. Partial implementation beats no implementation. Some business requirements conflict with security best practices. Your job is to mitigate risk within constraints, not just say no. Good enough depends on exposure, data sensitivity, likelihood of exploitation, compensating controls, and organizational risk tolerance. Insist on best practice for cryptography, authentication to critical systems, critical vulnerabilities in exposed systems, and compliance requirements. Accept trade-offs when risk is lower, when remediation isn’t immediately feasible, or when resources are constrained—but document what you’re accepting and why. Communicate clearly about risks being accepted and plans for improvement. Transparency matters. Avoid the rationalization trap. Temporary should actually be temporary. Review regularly whether mitigations are still in place. Make incremental progress toward better security even when you can’t fix everything immediately. The post Week 11: When ‘Best Practices’ Don’t Apply appeared first on Cultivating Security.
-
16
Week 10: Compliance Is Not Security (But You Still Have to Care)
Every security person eventually has this realization: passing the audit doesn’t mean you’re secure. You can check every box in the compliance framework. You can get your SOC 2 certification. You can satisfy your PCI audit. And still have significant security gaps that the auditor never looked at because they weren’t in scope. Compliance frameworks test for specific controls. They verify that you’re meeting defined requirements. They don’t assess whether those requirements are sufficient for your actual risk profile. They don’t test for risks that aren’t in the framework. They don’t evaluate how well your security program actually functions beyond what’s documented. But here’s the thing: you still have to care about compliance. Because compliance failures have immediate business consequences. Customer contracts depend on it. Regulatory penalties apply when you’re non-compliant. Business opportunities get lost if you can’t demonstrate compliance. So you’re stuck navigating this tension: compliance isn’t security, but you can’t ignore it. You need to pass audits without letting audit requirements become your entire security program. What Compliance Actually Tests Compliance frameworks test for the presence of controls and documented processes. They verify that what you say you do is what you actually do. “Do you have a documented information security policy?” Yes. Box checked. “Do you perform background checks on employees with access to sensitive data?” Yes. Box checked. “Do you have a process for reviewing user access quarterly?” Yes, here’s the documentation. Box checked. This is not trivial. Having documented policies and processes matters. Consistency matters. Being able to demonstrate that you’re following your own policies matters. But it doesn’t tell you whether your policies are adequate. Whether your access review process actually catches inappropriate permissions. Whether your incident response plan would work during a real incident. Auditors are testing against a standard, not against your specific risks. They’re verifying that controls exist, not that those controls are effective for your environment. The Scope Problem Audits have scope boundaries. They test the systems and processes that are in scope. Everything else is excluded. Your SOC 2 audit might cover your production environment. Your development environment isn’t in scope. Your DevOps pipeline isn’t in scope. Your SaaS applications might not be in scope. Your PCI audit covers the cardholder data environment. Everything that’s properly segmented out of the CDE isn’t in scope. This creates blind spots. Systems that matter for your security posture but aren’t included in compliance scope don’t get tested. Risks that aren’t addressed by the compliance framework don’t get evaluated. You can be fully compliant and still have significant security issues in out-of-scope systems or risks that the framework doesn’t address. Understanding scope is critical. Compliance tells you something about the systems and controls that were tested. It tells you nothing about what wasn’t tested. The Documentation vs. Reality Gap Auditors test documentation. They verify that your processes are documented and that you can show evidence of following them. If your documentation says you review access quarterly and you can produce evidence of those reviews, you pass. Whether those reviews actually resulted in removing inappropriate access is a different question. If your incident response plan is documented and you can show that people are trained on it, you pass. Whether it would actually work during a high-stress incident with incomplete information is not tested. If your change management process is documented and you can show approval records, you pass. Whether unapproved changes happen anyway because the process is too cumbersome and people work around it—that might not be visible to the auditor. Compliance measures adherence to documented processes. It doesn’t measure effectiveness of those processes or whether people actually follow them consistently. This creates an incentive to optimize for the audit rather than for actual security. Make sure the documentation is clean, make sure the evidence is available, make sure you can demonstrate compliance. Whether the security posture is actually strong is secondary. Good organizations resist this incentive. They use compliance as a minimum baseline and build beyond it. Less mature organizations treat compliance as the goal and stop there. The Snapshot Problem Audits are point-in-time assessments. They look at your security posture during the audit period, verify controls, and issue a report. That report becomes stale immediately. Your environment changes. New systems get deployed. Configuration drift happens. People leave and new people join. The documented state that passed audit diverges from current reality. Some compliance frameworks require continuous monitoring or periodic re-assessment. That helps. But there’s always a gap between the last time something was verified and the current state. Organizations with weak security discipline let that gap grow large. They tighten up for the audit, pass, then drift back to less rigorous practices until the next audit cycle. Organizations with strong security discipline maintain consistent practices regardless of audit timing. The audit verifies what they’re already doing. But either way, a compliance certification tells you what was true when it was issued. Not what’s true now. When Compliance and Security Align There are areas where compliance requirements and good security practice overlap significantly. Access controls. Most frameworks require some form of least privilege and access review. That’s also good security practice. Logging and monitoring. Frameworks typically require audit logging. That’s foundational for security as well. Encryption. Frameworks require protecting data in transit and at rest. That’s baseline security. Incident response. Having a documented plan and testing it is both a compliance requirement and a security necessity. In these areas, compliance requirements push organizations to do things they should be doing anyway. The compliance forcing function can be valuable—it creates business pressure to implement controls that might otherwise get deprioritized. This is where you can leverage compliance to advance security. “We need to do this to pass the audit” is often an easier sell than “we should do this for security reasons.” Use that when it works. Where Compliance Falls Short Compliance frameworks are generic. They’re designed to apply to many different types of organizations. That means they can’t be optimized for your specific risk profile. You might have unique risks that the framework doesn’t address. You might be in an industry with specific threats that generic frameworks don’t account for. You might have architectural patterns that create vulnerabilities the framework doesn’t test for. Compliance gives you a baseline. It doesn’t give you a complete security program. Frameworks also tend to lag behind threat evolution. By the time a control becomes a compliance requirement, it’s often already considered baseline security practice. The bleeding-edge threats and risks aren’t in the framework yet because there isn’t consensus on how to address them. If you’re only doing what compliance requires, you’re behind. Compliance is the floor, not the ceiling. The Audit Relationship Auditors are evaluating you against a standard. They’re not adversaries, but they’re also not consultants there to help you improve. Their job is to verify that you meet the requirements. They’re looking for evidence of compliance. When they find gaps, they document them as findings. How you respond to findings matters. Some findings are legitimate—you’re not meeting a requirement and you need to fix it. Some findings are debatable—you’re meeting the requirement differently than the auditor expected, or there’s ambiguity in how the requirement should be interpreted. You can push back on findings if you have a legitimate case. But pick your battles. Fighting every finding burns relationship capital and creates friction that might make future audits harder. It’s also worth building a good working relationship with your auditors. Being organized, responsive, and transparent makes the audit process smoother. Trying to hide problems or being difficult to work with makes auditors dig deeper. Auditors talk to each other. Your reputation with auditors affects how they approach your audit. If you’re known as an organization that takes compliance seriously and is straightforward to work with, that helps. If you’re known as an organization that cuts corners and fights everything, that works against you. Using Compliance as a Forcing Function Compliance requirements can be useful for getting resources and organizational buy-in for security work. “We need to implement MFA to maintain our SOC 2 certification” is often more compelling than “we should implement MFA because it’s good security practice.” “We have an audit finding that requires remediation by end of quarter” creates urgency that “we should probably address this risk at some point” doesn’t. “Customer contracts require us to maintain PCI compliance” is a business driver that’s hard to argue with. This isn’t manipulation. It’s recognizing that different stakeholders respond to different motivations. Leadership might not prioritize security risk in the abstract, but they will prioritize avoiding failed audits or lost business. Use compliance requirements strategically to advance security work that you know needs to happen anyway. But be honest about it. Don’t claim something is a compliance requirement if it isn’t. That destroys credibility when you get caught. The Over-Compliance Trap Some organizations treat compliance as the definition of security. If it’s in the compliance framework, we do it. If it’s not in the framework, we don’t. This is dangerous because it means you’re optimizing for someone else’s generic risk model instead of your actual risks. You might spend significant resources on controls that don’t matter much for your environment because they’re compliance requirements. Meanwhile, risks that are significant for you but aren’t in the framework go unaddressed. Mature security programs use compliance as one input among many. They implement controls because they make sense for their risk profile, and compliance is a factor in prioritization but not the only factor. Less mature programs conflate compliance with security. “We passed the audit so we’re secure.” That’s a dangerous assumption. The Measurement Problem Compliance produces binary outcomes. You pass or you don’t. You’re certified or you’re not. Security is continuous and gradual. Your security posture is always improving or degrading, it’s never static. And improvement isn’t binary—you can be more secure this quarter than last quarter without passing any particular certification threshold. Organizations often measure security by compliance status because it’s clean and reportable. “We achieved SOC 2 Type II certification” is an executive-friendly metric. “We improved our detection capabilities and reduced mean time to detect by 30%” is a more meaningful security metric but harder to communicate. This creates pressure to optimize for compliance metrics even when they’re not the most important security measurements. The answer isn’t to ignore compliance metrics. It’s to have better security metrics alongside them. Measure both compliance status and actual security capability. Don’t let the clean compliance metrics crowd out the messier but more meaningful security measurements. Living in Both Worlds The reality is you have to care about both security and compliance. You can’t ignore compliance because it has business consequences. You can’t treat compliance as sufficient because the gaps leave you exposed. The approach that works: Treat compliance as a minimum baseline. Meet the requirements. Pass the audits. But recognize that this is the floor, not the ceiling. Use compliance to advance security work. When compliance requirements align with security needs, use that alignment to get resources and organizational buy-in. Identify gaps between compliance and actual risk. Where does the compliance framework leave you exposed? Address those gaps even though they’re not required. Monitor upcoming changes to compliance frameworks. Follow draft updates, proposed revisions, and industry working groups shaping the next iteration of standards. If you implement controls before they become requirements, you avoid scrambling when the framework updates and you get ahead of future audit findings. This also positions you as forward-thinking rather than purely reactive. Maintain security rigor regardless of audit timing. Don’t just tighten up before audits. Maintain consistent practices. Build relationships with auditors. Make the process smoother by being organized and transparent. Don’t over-index on compliance metrics. Measure actual security capability alongside compliance status. Be honest about what compliance means. It’s a certification that specific controls were verified at a point in time. It’s not a guarantee that you’re secure. Practical Takeaways Compliance frameworks test for specific controls, not for comprehensive security. Passing an audit doesn’t mean you’re secure. Audit scope is limited. Out-of-scope systems and risks not addressed by the framework don’t get tested. Documentation doesn’t equal effectiveness. Having documented processes doesn’t mean they work well in practice. Compliance is a point-in-time assessment. Certifications become stale as your environment changes. Use compliance as a forcing function to get resources for security work that needs to happen anyway. Don’t treat compliance as the definition of security. It’s one input, not the complete picture. Maintain consistent security practices, not just compliance theater before audits. Build relationships with auditors. Being organized and transparent makes the process smoother. Measure both compliance status and actual security capability. Don’t let compliance metrics crowd out meaningful security measurements. Compliance is the floor, not the ceiling. Meet requirements, but build beyond them based on your actual risks. The post Week 10: Compliance Is Not Security (But You Still Have to Care) appeared first on Cultivating Security.
-
15
Week 9: Reading the Room: What Your CISO Actually Cares About
If you’re trying to get security work done, you need to understand what your leadership cares about. And I mean actually cares about, not what they say in all-hands meetings or what’s in the security strategy document. Because there’s often a gap between the official priorities and the real priorities. Between what sounds good and what actually drives decisions. Between the aspirational vision and the day-to-day reality of what gets resources and attention. This isn’t about your CISO being dishonest. It’s about the difference between what they wish they could focus on and what they’re actually accountable for. Between long-term strategic goals and immediate pressures. Between building the ideal security program and managing the organization they actually have. Understanding this gap—and learning to operate effectively within it—is critical. Because if you’re optimizing for what you think leadership cares about instead of what they actually care about, you’re going to be confused when your priorities don’t get support. The Board and Executive Pressure Your CISO has a boss. Usually it’s the CEO or CFO or CIO. And that person has priorities that shape what your CISO can realistically focus on. If the board is asking about cybersecurity risk quarterly, that’s going to drive attention to board-presentable security initiatives. Things that show measurable progress. Things that can be explained to non-technical executives. Things that demonstrate the organization is taking security seriously. That might mean compliance certifications even if they’re not the most impactful security work. That might mean high-visibility projects like MFA deployment even if there are more critical but less visible gaps. That might mean metrics that look good in a board deck even if they’re not the most meaningful security measurements. This isn’t your CISO being shallow. This is them managing upward to people who control budget and strategic direction. If the board cares about something, the CISO has to care about it—or at least has to show they’re addressing it. Similarly, if the CEO is worried about customer trust, security work that protects customer data gets prioritized. If the CFO is worried about financial risk, security work that prevents fraud or reduces insurance premiums gets attention. If the business is pursuing enterprise customers who require SOC 2, that becomes the priority. Your CISO’s priorities are shaped by what their leadership cares about. If you want to understand what will get resourced, start there. The Audit and Compliance Reality A lot of CISOs spend more time on compliance than they’d like. Not because compliance is the most important security work, but because compliance failures have immediate, measurable consequences. Audit findings have remediation deadlines. Compliance certifications affect customer contracts. Regulatory requirements have penalties for non-compliance. These create forcing functions that security improvements often don’t have. Your CISO might know that improving detection capabilities is more valuable than fixing the specific audit finding. But the audit finding has a deadline and potential business impact. The detection capability is important but not urgent. So compliance work gets prioritized. Not because it’s the best security work, but because it’s the security work with clear deadlines and clear consequences for not doing it. Understanding this helps you frame security initiatives. If you can connect your project to compliance requirements, it’s more likely to get resources. If you’re proposing work that’s purely about risk reduction without any compliance component, you’re competing against things that have regulatory or contractual forcing functions. That doesn’t mean you can’t get non-compliance work funded. But you need to make a stronger case, because it doesn’t have the built-in pressure that compliance work has. (We’ll dive much deeper into the compliance-versus-security tension in Week 9. For now, just understand that your CISO is navigating this dynamic constantly—compliance creates forcing functions that security risk assessments often don’t.) The Incident Pressure If your organization has had a security incident recently, that shapes priorities dramatically. The weakness that got exploited suddenly gets attention. If it was a phishing attack, now there’s budget for security awareness training. If it was unpatched vulnerabilities, now there’s pressure to improve patch management. If it was inadequate logging that made investigation difficult, now there’s support for logging improvements. This is unfortunate because it means security improvements are reactive rather than proactive. But it’s also reality. Incidents create urgency and political will that risk assessments often don’t. Your CISO is operating in this environment. If there’s been a recent incident, proposals that address similar risks get easier approval. Proposals that address different risks have to fight harder for attention. If your organization hasn’t had a major incident, there’s less urgency generally. Security is important, but it’s competing with other important things that have more immediate pressure. The other dynamic is incident preparedness. If your CISO is worried about the next incident (because of industry trends, peer organizations getting hit, increasing threat activity), they’re going to care about detection, response capabilities, and forensic readiness. Projects that improve those capabilities align with that concern. The Resource Constraints Your CISO is managing with finite budget, finite staff, and finite organizational capacity for change. They might know that the ideal security program includes a dozen major initiatives. But they can realistically fund three this year. They have to choose. They might want to hire five more security analysts. But they’re approved for two headcount and they’re competing with every other department that also wants to hire. They might want to implement comprehensive security improvements. But the IT team is already overloaded, and asking them to take on more work means something else gets delayed or dropped. These constraints are real. Acknowledging them when you’re proposing work shows you understand the environment you’re operating in. This ties directly back to what we covered in Week 7 about why security projects fail. Budget competition, finite staff capacity, and organizational bandwidth constraints aren’t excuses—they’re the reality your CISO navigates every day. Understanding these constraints helps you propose work that can actually succeed. This means being realistic about scope. Proposing a multi-year initiative when there’s budget for a one-year pilot. Proposing something that can be implemented with existing staff or modest contractor help rather than requiring three new headcount. Proposing something that integrates with current work rather than requiring a separate dedicated effort. Your CISO is looking for proposals that deliver value within realistic constraints. Not proposals that require perfect conditions and unlimited resources. The Risk They’re Actually Worried About Your CISO has a mental model of the organization’s biggest security risks. That model might not match yours. Maybe you think the biggest risk is inadequate network segmentation. They think it’s third-party vendor risk because of recent supply chain attacks in your industry. Maybe you think the priority should be improving vulnerability management. They think it’s insider threat because of recent employee incidents. Maybe you think the focus should be technical controls. They think it’s security culture because the organization keeps making the same mistakes. Understanding what they’re actually worried about—not what the risk assessment says, but what keeps them up at night—helps you align your work with their priorities. Sometimes you can learn this from direct conversation. “What are you most concerned about right now?” is a reasonable question. Sometimes you can infer it from what gets attention and resources. What do they ask about in meetings? What do they fund even when budget is tight? What do they escalate when incidents happen? If you’re proposing work that addresses a risk they’re not worried about, you need to make the case that they should be worried about it. If you’re proposing work that addresses a risk they are worried about, you’re aligned with their existing priorities and you’ll get a much warmer reception. The Measurable Progress Problem Leadership loves metrics. Your CISO probably has to report security metrics to executive leadership or the board. This creates pressure to work on things that produce measurable improvement. Patch compliance percentages. Percentage of systems with MFA. Number of security training completions. Time to detect and respond to incidents. Not all valuable security work produces clean metrics. Cultural change is hard to measure. Architectural improvements might not show up in standard dashboards. Detection engineering doesn’t produce a simple percentage. But measurable work gets reported up the chain. It shows progress. It demonstrates that the security program is doing something. Your CISO cares about work that produces results they can show their boss. If your proposal will improve security but won’t produce any measurable evidence of that improvement, that’s a harder sell. This doesn’t mean you should only work on things that produce metrics. But it does mean that if you can articulate how success will be measured and demonstrated, your proposal becomes more attractive. The Political Capital Budget Your CISO has political capital with other leaders, and they spend it carefully. Every time they push for something that creates friction—delays a project for security review, denies a business request, requires organizational change that people resist—they’re spending political capital. If they spend it too freely, they lose influence. People stop taking them seriously. Their requests get ignored or worked around. So they pick their battles. They push hard on things that matter most. They compromise on things that matter less. They build relationships so they have capital to spend when they need it. This means they’re not going to fight for every security proposal you bring them. They’re going to support the ones that are most important and most defensible. The ones where the risk is clear and the solution is reasonable. If you bring them a proposal that would burn political capital (requires other leaders to give up something they value, creates significant friction, challenges existing relationships), you’d better have a very strong case for why it’s worth it. Understanding this dynamic means being selective about what you escalate and ask them to fight for. Bring them things that matter. Don’t burn their political capital on things that are nice-to-have or that you could accomplish through other channels. If you read Week 6 on reporting through IT leadership and Week 7 on political navigation, this concept should sound familiar. Your CISO is doing the same thing at a higher level—managing relationships, building capital, picking battles. The skills we talked about for you to operate effectively apply to them too. They’re just operating with higher stakes and broader organizational scope. What They Wish They Could Focus On Most CISOs wish they could spend more time on strategic, proactive security work. Threat modeling. Architecture review. Long-term program building. What they actually spend time on is often more reactive and tactical. Compliance deadlines. Incident response. Vendor negotiations. Executive reporting. Firefighting. They know what the ideal looks like. They also know what’s realistic given their constraints. When you’re proposing work, understand the difference between what they wish they could prioritize and what they can actually prioritize right now. If you’re proposing strategic, long-term work, acknowledge that it’s competing with immediate demands on their time and attention. Frame it in a way that shows you understand the trade-offs. If you can find ways to make strategic work fit into their current reality—phased implementation, leveraging existing resources, aligning with compliance requirements—you make it easier for them to say yes. How to Communicate Effectively Understand their constraints before you ask for things. Budget cycle timing. Current incidents or pressures. Political relationships. Compliance deadlines. What they’ve already committed to. Frame proposals in business terms. Not just technical security improvement—impact on risk, compliance, customer trust, operational resilience. The things they have to report on. Be realistic about scope and resources. Don’t propose things that require conditions that don’t exist. Propose things that can actually be accomplished. Articulate measurable outcomes. How will you demonstrate that this work achieved something? What metrics or evidence will show progress? Align with their known priorities. If they’re focused on third-party risk, propose work that addresses that. If they’re focused on compliance, connect your work to compliance requirements. Bring solutions, not just problems. They already know there are risks everywhere. If you’re bringing them a problem, bring at least one realistic option for addressing it. Pick your battles. Don’t escalate everything. Don’t ask them to fight for everything. Reserve that for things that genuinely matter. Help them manage up. If your work produces something they can show their boss or the board, make that easy. Give them the summary, the metrics, the explanation that they can use. What This Isn’t This isn’t about being political in a manipulative sense. It’s about being effective. Your CISO has constraints and pressures that shape what they can realistically accomplish. Understanding those constraints helps you propose work in ways that are more likely to succeed. It’s about communicating in terms they care about, aligning with their priorities, and being realistic about what’s achievable. That’s not manipulation—that’s basic organizational effectiveness. Practical Takeaways Your CISO’s priorities are shaped by what their leadership cares about. Understand the board and executive pressures they’re managing. Compliance work gets prioritized because it has deadlines and consequences. Connect your proposals to compliance requirements when possible. Recent incidents drive priorities. Work that addresses similar risks gets easier approval. Resource constraints are real. Propose work that fits within realistic budget, staffing, and organizational capacity. Understand what risks they’re actually worried about, not just what’s in the formal risk assessment. Measurable work gets funded more easily. Articulate how success will be demonstrated. Political capital is finite. Don’t ask them to fight for everything. Be selective about what you escalate. Frame proposals in business terms with clear outcomes. Make it easy for them to say yes. Help them manage up. If your work produces board-reportable results, make those easy to communicate. Read the room. Understand what they’re actually able to focus on versus what they wish they could focus on The post Week 9: Reading the Room: What Your CISO Actually Cares About appeared first on Cultivating Security.
-
14
Week 8: Why Security Projects Fail (And It’s Usually Not Technical)
You’ve probably seen this: a security initiative that makes perfect technical sense, that addresses real risk, that has clear value—and it dies anyway. Not because the technology doesn’t work. Not because the solution is flawed. It dies in a conference room during a budget meeting, or gets deprioritized when a business initiative takes precedence, or gets killed because nobody wants to deal with the organizational change it requires. Security projects fail for organizational and political reasons far more often than they fail for technical reasons. And if you don’t understand those dynamics, you’re going to keep proposing good ideas that go nowhere, and you’re going to get frustrated wondering why leadership “doesn’t take security seriously.” Often they do take it seriously. They’re just managing constraints and priorities that you might not see or understand. Your job is to figure out how to work within those constraints, or how to change the constraints if they’re actually fixable. A note on broader applicability: This post focuses on security projects specifically, but the organizational dynamics we’re covering—budget competition, change resistance, executive sponsorship, political navigation—apply to ANY initiative that requires organizational change and resources. If you’re a network engineer trying to get budget for infrastructure upgrades, a developer advocating for technical debt reduction, or an operations manager proposing process improvements, the same patterns apply. Understanding these dynamics makes you more effective regardless of what type of initiative you’re trying to drive. The Budget Reality Security competes with everything else the organization needs to spend money on. That SIEM you want? That’s $200K annually. Which is also three mid-level engineers. Or a business analyst the sales team has been requesting. Or infrastructure upgrades that have been delayed for two years. Or the CRM implementation that’s going to improve customer retention. Money is finite. Saying “security is important” doesn’t create budget. Every dollar spent on security is a dollar not spent on something else. Your leadership is making trade-offs constantly. Revenue-generating initiatives tend to get prioritized because they justify themselves—”if we invest X, we’ll generate Y in additional revenue.” That’s a clear ROI calculation. Security investment is about avoiding negative outcomes. “If we invest X, we reduce the probability of a breach that might cost Y.” That’s a risk reduction calculation, which is inherently squishier and less compelling. It shouldn’t be that way. The cost of a breach can be quantified reasonably well. But psychologically, spending money to enable growth is more appealing than spending money to prevent hypothetical harm. Understanding this dynamic doesn’t mean accepting inadequate security budgets. It means framing your requests in ways that acknowledge the reality of budget competition and make the clearest possible case for why this particular investment matters more than the alternatives. The Organizational Change Problem A lot of security improvements require people to change how they work. And people hate changing how they work. Implementing MFA means users have to do an extra step when they log in. Rolling out a new access review process means managers have to spend time reviewing permissions. Enforcing least privilege means people lose access they’ve had for years and have to request it when they need it. These are all reasonable security measures. They’re also friction. And friction generates resistance. Sometimes the resistance is loud. “This is going to slow us down.” “This is going to hurt productivity.” “Users are going to hate this.” Sometimes it’s passive—people just don’t do the thing, or they find workarounds, or they complain until leadership caves. Either way, if your security project requires organizational change and you haven’t planned for managing that change, you’re going to struggle. This means communication. Explaining why the change is happening, what the actual impact will be, how it benefits people (or at least the organization). It means providing support during the transition—help desk staffing for the initial rollout, documentation, training. It means having executive sponsorship so that when people push back, leadership reinforces that this is happening. A lot of security people treat this as someone else’s problem. “I designed the technical solution, implementation is operations’ job, change management is HR’s job.” But if you care whether the project succeeds, you care about adoption. And adoption requires managing the human factors, not just the technical ones. The “Not Right Now” Problem Even when people agree a security project is valuable, it’s often not the highest priority right this moment. There’s a major product launch coming. There’s an acquisition being integrated. There’s a regulatory audit happening. There’s a system migration in progress. There’s always something. “Let’s do this after the launch.” “Let’s wait until the audit is done.” “Let’s revisit this next quarter when things calm down.” Sometimes this is legitimate. Timing matters. Implementing major security changes during a critical business period might genuinely be a bad idea. Sometimes it’s avoidance. Next quarter there will be a different reason why it’s not the right time. Things never actually calm down. This is how security work gets perpetually deprioritized. The difference is whether there’s actually a plan to revisit it or whether “not right now” is a soft no. If leadership says “this is important but we need to wait until after the product launch, let’s schedule planning for next month,” that’s different from “we’ll get to this when we have time” with no actual commitment. Your job is to distinguish between realistic timing constraints and indefinite deferral disguised as timing constraints. The Executive Sponsorship Gap Security projects that don’t have executive sponsorship rarely succeed. You can have the best technical plan. You can have identified a real risk. You can have a clear implementation path. But if nobody with organizational authority is backing it, it’ll get killed the first time it creates friction or competes with something else. Executive sponsorship means someone in leadership who will say “this is happening” when people push back. Someone who will defend the budget when it’s challenged. Someone who will prioritize it when other initiatives compete for resources. Someone who has the authority to make decisions and the political capital to make them stick. Without that, you’re trying to implement organizational change through force of argument and technical competence. Which doesn’t work when you don’t have decision-making authority. Finding an executive sponsor means understanding who cares about what you’re trying to accomplish and why. Maybe it’s the CFO who’s worried about financial risk. Maybe it’s the COO who’s concerned about operational disruption. Maybe it’s the General Counsel who understands regulatory exposure. Frame the security work in terms they care about. Build the relationship. Get them bought in. Then they can be your advocate in rooms you’re not in. The Competing Priorities Reality Organizations have finite attention and capacity. Even if budget isn’t the constraint, bandwidth is. The IT team is already working on five major projects. The infrastructure team is underwater dealing with technical debt. The application team is fully allocated to new feature development. The operations team is firefighting daily. Your security project requires time and effort from all of them. Where does that time come from? Either something else gets deprioritized (which means someone else’s project gets delayed or killed, and they’re not going to be happy about that), or people work more hours (which isn’t sustainable and leads to burnout), or your project gets sequenced after their current work (which means it’s delayed, possibly indefinitely). This is the reality of working in organizations. Everything competes for the same resources. Your project isn’t evaluated in isolation—it’s evaluated against everything else people could be doing instead. Understanding this means being realistic about timing and scope. Maybe you can’t do the full implementation right now, but you can do phase one. Maybe you can’t get dedicated resources, but you can get part-time support. Maybe you can’t launch in Q2, but you can realistically launch in Q4. Flexibility in how you approach the work makes it more likely to actually happen. The “We Already Did Security” Problem Sometimes organizations think they’ve already addressed security adequately, and new initiatives are viewed as unnecessary. “We already have a firewall.” “We already do patching.” “We already have antivirus.” “We already passed the audit.” Therefore, additional security investment must not be needed. This is frustrating because security isn’t binary. Having basic controls doesn’t mean you have adequate controls. Passing an audit doesn’t mean you’ve addressed all your risks—it means you’ve met the specific requirements that audit tested. But try explaining that without sounding like you’re moving goalposts or just asking for more budget because security is never done. The way through this is demonstrating gaps clearly. Not theoretical risks—actual gaps in visibility, detection, or capability. “We have a firewall, but we can’t detect lateral movement inside the network.” “We do patching, but we have no visibility into SaaS applications.” “We passed the audit, but we can’t investigate incidents because our logging is inadequate.” Make it concrete. Make it about specific capabilities you lack and specific scenarios where that creates exposure. Abstract arguments about security being an ongoing process are less compelling than concrete demonstrations of what you can’t do right now. The Risk Communication Failure A lot of security projects fail because the risk was never communicated in a way that resonated with decision-makers. You know the threat is real. You understand the potential impact. You can explain the technical details of how an attack would work. None of that matters if leadership doesn’t understand why they should care. This goes back to what we talked about in Week 2 about calibrating your risk tolerance to organizational reality. Your assessment of what’s critical might not match leadership’s assessment—not because they’re wrong, but because they’re weighing factors you might not see. Your job is to translate the risk into terms that connect with their context. And remember from Week 6: speak their language, not yours. Security terms mean nothing to them. Business impact does. “We’re vulnerable to credential theft” is abstract. “If an attacker gets one compromised account, they can access our customer database and financial systems, which means breach notification, regulatory penalties, and significant reputational damage”—that’s concrete. “We need better logging” is vague. “Right now, if we have an incident, we can’t determine what data was accessed or how long the attacker was in our environment, which means we have to assume the worst case for breach notification and we have no evidence to show regulators that we responded appropriately”—that’s specific. Decision-makers need to understand the business impact, not the technical vulnerability. They need to understand the realistic threat, not the theoretical worst case. They need to understand what happens if this risk materializes, in terms of money, reputation, regulatory exposure, operational disruption—things they’re measured on. If you’re explaining security risk in security terms, you’re probably not being understood. The Perfect vs. Good Enough Problem Sometimes security projects fail because they’re over-engineered for the actual need. You want the enterprise-grade solution that solves everything comprehensively. Leadership is willing to fund the good-enough solution that addresses the most critical gap at a fraction of the cost. You want to implement least privilege everywhere. Leadership is willing to do it for the highest-value systems first and revisit the rest later. You want comprehensive logging. Leadership is willing to fund logging for critical systems and compliance requirements. You want to move from traditional antivirus to EDR with full detection and response capabilities. IT is nervous about the “what ifs”—what if it impacts performance, what if it generates too many alerts, what if users complain? Deploy EDR initially in monitor-only mode or with basic detection enabled. Prove it doesn’t break things. Then enable advanced features progressively over the next year or two. You still get better security than AV, just not everything at once. The perfect solution that never gets funded is worse than the good-enough solution that actually gets implemented. This doesn’t mean compromising on things that are genuinely critical. But it means being honest about what’s critical versus what’s nice-to-have. It means being willing to phase implementations. It means accepting that incremental progress is still progress. (Go back to Week 2 if you need a reminder: Fort Knox isn’t the goal. Sometimes the phased, imperfect implementation is actually better security than holding out for the comprehensive solution that never gets approved.) Sometimes the perfect solution isn’t realistic given organizational constraints. Good-enough that’s achievable beats perfect that never happens. The Politics You Can’t Ignore Security work happens in a political environment whether you like it or not. There are power dynamics. There are relationships. There are people who have influence beyond their formal authority. There are historical conflicts that shape current decisions. There are alliances and rivalries. If you ignore all of this and just focus on the technical merits, you’re going to be confused when technically sound projects fail and technically questionable projects succeed. I’m not saying you need to become a politician. But you need to understand the political landscape enough to navigate it. Who are the decision-makers? Who influences them? Who benefits from your project succeeding? Who might perceive it as threatening their domain? Who’s going to push back and why? Here’s something practical: there will be naysayers to any initiative. Learn what they consistently stand on. Trust me—it’s almost always the same exact things, maybe phrased differently, but if you boil it down to the core objection, it’s the same every time. The IT Help Desk manager who pushes back on every new tool? It’s about call volume. He doesn’t want an influx of break-fix calls or password resets. So when you propose MFA, don’t just talk about security benefits. Address his concern directly—and better yet, show how it helps him: “With MFA using authenticator apps, we can now enable self-service password resets. Users can prove their identity and reset their own passwords without calling the help desk. Here’s the training we’ll provide. Here’s the online resource users can reference. And I’ll personally help answer calls during the first week of rollout. Long-term, this should actually reduce your password reset call volume.” The developers who resist using a secrets manager? They’re thinking it’s going to be complicated—that they’ll have to write dedicated functions and rewrite significant portions of their code. Learn their stack. Most secrets managers and vaults provide drop-in modules for common languages. Show them the actual code: “Here’s the three-line change in Python. Here’s the SDK for Node. Here’s how it works in .NET. It’s literally a module import and one function call to retrieve secrets.” Better yet: learn a little bit of the code your developers actually work in and write a few small demo apps that show it in practice. You don’t need to be an expert—just competent enough to prove it works in their environment with their stack. Walking into a meeting with a working example in their language earns respect. It shows you’re not just theorizing from a security ivory tower—you actually understand what you’re asking them to do. And it makes it harder for them to claim it’s too complicated when you’ve already done it. The application team lead who resists security requirements? It’s usually about timelines and scope creep. So when you need security reviews integrated into their process, frame it as: “I can review designs in 24 hours if you get them to me early. That’s faster than trying to retrofit security after implementation, which delays your launch.” Understanding this helps you build support before you need it. It helps you anticipate objections. It helps you frame proposals in ways that address what people actually care about—not just what they say they care about. Security people often want to believe that good ideas win on merit. In functional organizations, merit matters. But it’s never the only thing that matters. What Actually Works Start small and demonstrate value. Don’t ask for everything up front. Ask for enough to prove the concept, deliver results, and build credibility for the next phase. Build coalitions. Find allies in other parts of the organization who benefit from what you’re trying to do. IT operations might care about better logging because it helps them troubleshoot. Legal might care about access controls because it reduces compliance risk. Get them on your side. Frame in business terms. Not technical risk—business impact. Not security concepts—outcomes leadership cares about. Have executive sponsorship. Find someone with authority who will advocate for the work. Without that, you’re fighting uphill. Plan for organizational change. If your project requires people to work differently, plan for how you’ll manage that. Communication, training, support. Be flexible on timing and scope. Phase the work if you can’t do it all at once. Accept delays if they’re unavoidable. Adapt to organizational constraints. Understand the political landscape. Know who the decision-makers are, who influences them, and what they care about. Accept imperfect progress. Good-enough that gets implemented is better than perfect that doesn’t. Practical Takeaways Security projects compete for budget and resources with everything else the organization needs. Frame requests to acknowledge this reality. Organizational change is harder than technical implementation. Plan for managing the human factors. Executive sponsorship is critical. Find someone with authority who will advocate for the work. Communicate risk in business terms, not technical terms. Business impact, regulatory exposure, operational disruption—things leadership is measured on. Be flexible on scope and timing. Phased implementation beats waiting for perfect conditions. Understand the political landscape. Decision-making isn’t purely rational. Relationships and influence matter. Start small, demonstrate value, build credibility for larger initiatives. Prove you can deliver. Good-enough that happens beats perfect that doesn’t. Accept incremental progress. Most security project failures are organizational, not technical. Plan accordingly. The post Week 8: Why Security Projects Fail (And It’s Usually Not Technical) appeared first on Cultivating Security.
-
13
Week 7: Reporting to IT: How to Build Security When You’re Not in Charge
A lot of security people find themselves in this position: you’re the security person, or the security team, reporting up through IT leadership that didn’t come up through security. Maybe your boss is the CIO who built their career in infrastructure. Maybe it’s an IT Director who came up through application development or helpdesk management. Maybe it’s a VP of Technology who understands the business side but not necessarily the security nuances. Or maybe it’s someone in a non-technical role entirely—a VP of Operations, a Chief Administrative Officer, whoever happened to have budget room when the organization finally decided they needed “someone doing security.” You got placed there not because it made organizational sense, but because they didn’t know where else to put you. Nothing wrong with those backgrounds. But the risk calculus is different when you’re thinking about security versus when you’re thinking about delivering projects, maintaining uptime, or supporting users. And if you’re reporting to someone in a non-technical role, that gap can be even wider—you’re translating not just security to IT, but security to business operations without a shared technical foundation. Either way, that difference creates friction that you need to navigate if you’re going to get anything done. This isn’t an impossible situation. Plenty of people build effective security programs while reporting through IT or operations. But it requires skills that aren’t technical—communication, political awareness, strategic patience, and the ability to pick your battles. The Fundamental Tension IT and security have overlapping responsibilities but fundamentally different objectives. IT is measured on delivery. Did the project launch on time? Are systems available when users need them? Are tickets getting resolved? Are users happy? Success is visible and tangible. Security is measured on things not happening. No breaches. No compliance failures. No incidents. Success is invisible, and when you’re doing your job well, it looks like you’re not doing much of anything. That creates a natural tension. IT wants to move fast, deliver projects, say yes to business requests. Security’s job is to make sure those things happen in ways that don’t create unacceptable risk. Here’s where a lot of security people get it wrong: they think their job is to say no. It’s not. Your job is to say “here’s the risk, here are options for managing it, here’s what I recommend, and here’s what happens if we don’t.” That’s a completely different conversation than just “no.” When your IT leadership came up managing infrastructure or applications, their instinct is to solve problems by delivering solutions. Security means helping them deliver those solutions in ways that manage risk appropriately—which sometimes means different approaches, additional controls, or adjusted timelines, but it shouldn’t mean “we can’t do this.” That’s not a personality conflict. It’s structural. And understanding that it’s structural—not personal—helps you navigate it more effectively. Excellent point – this is a HUGE problem. IT using security as the scapegoat/bad guy without actually consulting security is incredibly common and damaging. This probably fits better in a different section though – maybe under “Common Pitfalls” or “What Makes This Harder” rather than “Fundamental Tension.” Here’s why: The fundamental tension is about legitimate structural differences in objectives. What you’re describing is a dysfunctional communication pattern that makes everything worse. When IT Speaks for Security (Without Asking) Here’s a dynamic that complicates everything: IT managers citing security as the reason for decisions without actually consulting security. ‘Security won’t allow it’ becomes the convenient explanation for things they don’t want to do—maybe it’s extra work, maybe it’s technically challenging, maybe they just think it’s a bad idea. Now you’re positioned as the obstacle before you’ve even seen the request. When you do show up to meetings, people expect you to say no—because someone’s already been saying no on your behalf. Sometimes this is well-intentioned. They genuinely think you’d object, so they’re trying to save everyone time. But they’re also putting words in your mouth and framing security as the department of no before you’ve even seen the request. Other times it’s strategic. Security becomes the convenient excuse for not doing things they don’t want to do for completely unrelated reasons. It’s easier to blame security policy than to explain they don’t have the resources, or don’t think the project is a good idea, or just don’t want to. Either way, it’s a problem. Because now when you do show up to meetings, people expect you to say no. You’ve been positioned as the obstacle before you even open your mouth. And when you actually do raise legitimate security concerns, people assume you’re just doing your “security says no” routine again. You can’t completely prevent this, but you can address it: Make yourself available for consultation before decisions get made. If IT can easily ask “would security have issues with this?” they’re less likely to guess at your answer. When you find out it’s happened, have a direct conversation with the person who spoke for you. Not confrontational—just “hey, next time loop me in before using security as the justification, because I might have a different take.” Be consistent about your actual positions. If people know where you actually stand on common issues, they can’t as easily invent objections on your behalf. And when you’re in meetings, be explicit about what you’re actually concerned about versus what you’re fine with. Don’t let people fill in blanks about your position. Understanding Non-Technical Leadership If your boss has a strong technical background but not specifically in security, they understand technology but might not intuitively grasp security risk the way you do. They know what a vulnerability is, but they might not have the pattern recognition to distinguish between “this is bad” and “this is drop-everything critical.” They understand access controls conceptually, but they might not see why least privilege matters so much. They get that patching is important, but the urgency of a critical vulnerability in an internet-facing system might not register the same way. This isn’t ignorance. It’s just a different mental model built from different experience. If your boss doesn’t have a deep technical background at all, the challenge is different. They need you to translate technical risk into operational and business impact. “This vulnerability could allow lateral movement” doesn’t mean anything to them. “If this gets exploited, we lose access to our primary business application and potentially trigger breach notification requirements under state law”—that’s actionable. But here’s the fine line: when you’re translating risk into business impact, you can easily slip into scare tactics—either intentionally or without realizing it. And once you do that a few times, once you cry wolf about critical risks that don’t materialize into actual incidents, your credibility is gone. They start viewing everything you say as overblown. When you actually do have a drop-everything problem, they won’t believe you. So be careful with your language. Have data. Have justification. Don’t say “this could lead to a catastrophic breach” when what you mean is “this is a gap we should close but the exploitability is low and we have other controls in place.” Save the urgent language for things that are actually urgent. They’re managing budgets, vendor relationships, board expectations, competing priorities across IT, and a dozen other things you might not see. Your job is to help them understand security risk in that broader context—accurately, not dramatically. What Actually Works Speak their language. Risk in terms of business impact, not technical details. “This creates compliance exposure” is better than “the encryption implementation is weak.” “This could cause a production outage” is better than “the architecture violates security principles.” You need to be able to explain why something matters in terms they care about. Revenue impact. Regulatory penalties. Customer trust. Operational stability. Audit findings. Those are the frames that resonate. Understand their pressures. IT leadership is constantly balancing competing priorities. If you come in with “we need to do this immediately” every week, you’ve trained them to ignore you. Not everything is actually urgent. Some things are important but can be scheduled. Some things are nice-to-have. Learning to distinguish between “drop everything” and “let’s plan this for next quarter” and “we should do this when we have time” is critical. If you make everything urgent, nothing is. If you’re consistently accurate about what’s actually urgent versus what can wait, you build credibility. Build credibility through small wins. You’re not going to get budget for a SOC on day one. You’re going to get budget for MFA after you’ve demonstrated that the phishing simulation you ran (which cost almost nothing) revealed a genuine problem. And a note on those phishing simulations: stop doing “gotcha” phishing that’s designed to trick people and then shame them. That builds resentment, not awareness. Do educational phishing that reinforces what you’ve actually trained people on. Start with obvious examples, progressively make them harder as people get better. The goal is to build skills and confidence, not to catch people failing so you can prove you were right about needing security training. Show that you understand the business. Deliver value incrementally. Prove that you’re not just identifying problems but helping solve them in ways that are realistic for the organization. Document clearly but professionally. When you raise a risk and it doesn’t get addressed, document it. Not in a “cover your ass” way that’s obvious and annoying, but professionally. Email summary after meetings. Risk register that’s maintained. Clear, factual documentation of what was discussed, what was decided, what the rationale was. If risks you’ve documented do materialize, having a track record of having raised them—not to say “I told you so,” but to establish that your risk assessment is calibrated correctly—builds credibility for the next time you raise something. Align with their goals. If IT is measured on project delivery, figure out how to make security an enabler rather than a blocker. “We can get you to production faster if we build this in from the start rather than trying to retrofit it during UAT” is much more effective than “you need to delay launch for security review.” Find ways to frame security work as helping them achieve their objectives, not preventing them from achieving their objectives. Pick your battles. Not every hill is worth dying on. Here’s what this actually means: You raise a risk. You explain it clearly. You document it. Leadership decides not to address it right now—maybe because of budget, maybe because of competing priorities, maybe because they assess the likelihood differently than you do. Your job at that point is to accept the decision and move on. You’ve done your part by identifying and articulating the risk. You’ve given them the information to make an informed choice. If they decide the risk is acceptable given everything else on their plate, that’s their call to make. Document this in your risk register. The risk, the recommendation, the decision not to address it, who made that decision, and when. This isn’t about blame—it’s about having a factual record of organizational risk acceptance. Other risks are different. These are the ones where the potential impact is severe enough, and the likelihood is high enough, that you can’t just document and move on. For these, you need to understand why the answer was no and whether there’s a path forward. We’ll dive deeper into how to navigate those situations in the next section on political navigation. Wisdom is knowing the difference. And understanding that even when you do everything right on a critical risk, leadership might still make a choice you disagree with. That’s not failure on your part. That’s organizational risk acceptance. If you fight everything with the same intensity, you’ll lose credibility and burn out. If you’re selective—if you clearly differentiate between ‘here’s a risk to be aware of’ and ‘this requires more thorough discussion before we decide’—people learn that when you say something is critical, you mean it. The Political Navigation You’re going to encounter situations where leadership makes decisions you disagree with. Sometimes because they have context you don’t. Sometimes because they’re willing to accept risk you’re not comfortable with. Sometimes because they’re just wrong. Learning to tell the difference is important. When it’s context you’re missing: Ask questions. Understand the trade-offs being made. This is where “Help me understand what’s driving this decision” becomes critical. Maybe it’s budget constraints—there’s no money this year, but it could be revisited next quarter. Maybe it’s resource contention—finishing another critical project would be delayed if people get pulled to work on this. Maybe it’s that they assess the likelihood differently than you do and need more evidence. Maybe it’s organizational politics you’re not seeing. Maybe you didn’t present it clearly enough and they didn’t fully grasp the impact. That information is valuable. It tells you what you need to address to get a different answer. If it’s budget, you can propose a phased approach or wait for the next cycle. If it’s resource contention, you can help find ways to reduce the implementation burden or adjust the timeline. If they need more evidence, you can gather it. If your presentation wasn’t clear, you can reframe it in terms that resonate better with their priorities. Sometimes you get there. You understand the objection, you address it, and the answer changes from “no” to “yes, if we do it this way” or “yes, next quarter.” That’s a win, and it happened because you listened and adapted rather than just pushing harder with the same approach. Sometimes there are business constraints you’re not aware of. Customer commitments that shape priorities. Strategic initiatives that haven’t been announced yet. Budget realities that aren’t your problem to solve. If they have context that changes the risk calculation, that’s legitimate. Your job is to make sure they understand the security implications; their job is to make the decision in the broader context they’re managing. When it’s acceptable risk: Document it clearly. Make sure the decision-maker understands what they’re accepting and what the potential consequences are. Then move on. You don’t have to agree. But if they’re accountable for the decision and they’re making it with clear understanding of the risk, that’s their call to make. At that point, you document the final decision in your risk register—including what you tried, what their reasoning was, and the fact that this is a deliberate acceptance of known risk. It’ll feel wrong sometimes. You’ll see risks that make you uncomfortable. But if you can’t let go of decisions that have been made by people with the authority to make them, you’ll be miserable. Whether you can live with that outcome is a personal decision about whether this organization’s risk tolerance aligns with yours. (If you haven’t read Week 2’s sections on understanding and calibrating risk tolerance, go back and read those now—this is where that concept becomes very real in practice.) When it’s genuinely wrong: Escalate carefully. Bring data. Bring business impact. Bring regulatory implications if they exist. Don’t make it personal. Don’t make it emotional. Make it about the risk and the consequences and why this particular decision creates exposure the organization shouldn’t accept. And understand that the decision might still not change. That’s not you “losing.” That’s the organization choosing a level of risk that you might not be comfortable with—and that’s their right. It’s not your business that you own. You’ve done your job by clearly articulating the risk and the potential consequences. This is where calibrating your risk tolerance to organizational reality becomes essential. (If you haven’t read Week 2’s sections on understanding and calibrating risk tolerance, go back and read those now—this becomes critically important when you’re facing these decisions in practice.) Most of the time, with experience and context, you’ll find you can adapt. You’ll learn to distinguish between risks that genuinely keep you up at night and risks that feel uncomfortable but are actually reasonable business decisions given the organization’s constraints. That calibration is part of professional growth. Sometimes there are genuine misalignments—organizations that are reckless in ways that cross ethical or professional lines. But that’s rare. More often, it’s about learning to work effectively within an organization’s actual risk appetite rather than chasing the theoretical perfect state you might prefer. (Remember Week 2: Fort Knox isn’t the goal. Perfect security that nobody uses, that creates so much friction people work around it, that consumes resources needed elsewhere—that’s not security, it’s security theater that makes things worse. If you’re struggling with this concept, go back and read that entire post.) The Organizational Structure Problem Here’s the uncomfortable truth: if you’re a team of one or two reporting through IT, there’s a ceiling on what you can accomplish. You can build solid foundations. You can prevent a lot of common problems. You can mature processes incrementally. But you can’t transform the security posture of an organization from that position. Security as a function reporting through IT signals something about organizational priorities. It says security is an operational concern that IT manages, not a strategic concern that gets executive attention. That might be appropriate for some organizations. Small companies where IT and security overlap significantly. Mature environments where security is already in good shape and just needs maintenance. Non-regulated industries where the risk profile is genuinely lower. But for a lot of organizations, especially as they grow, that structure becomes a constraint. Security needs input where business decisions get made—not necessarily a C-level executive role, but a position where you’re consulted and brought in early to planning rather than being the afterthought or the checkbox item. It needs the ability to influence architecture at the design stage, not review it after implementation is done. (I’ve written about this concept more fully in Security Third: Why “Security First” Makes Organizations Less Secure—the goal isn’t to put security above everything else, but to make security an everyday consideration woven into how the organization works.) Your job in the current structure is to do good work, build credibility, and position yourself (or your successor, or the function) for that evolution when it comes. Sometimes the transition happens because of growth—the organization reaches a size where security reporting through IT doesn’t make sense anymore. Sometimes it happens because of an incident that makes leadership realize security needs more attention. Sometimes it happens because of regulatory pressure or customer requirements. If you’re doing your job well, you’ll help accelerate that evolution by demonstrating the value of security as a function and building the case for why it needs different organizational positioning. Recognizing Challenges and Finding Paths Forward Some organizational dynamics are challenging but workable. Some are warning signs that need attention. Here’s what to watch for and what you can do about it. Every security proposal gets killed on cost without real discussion of risk. If the answer is always “too expensive” without actually weighing the cost against the risk, you’re not having real conversations about security. What you can do: Start quantifying risk in business terms. Not “we need this security tool,” but “here’s what it costs if this risk materializes, here’s the likelihood, here’s what mitigation costs.” Make it a business decision, not a security decree. Partner with finance if they exist—they speak the language leadership understands, and they can help you establish actual cost figures for your organization. Try this exercise: work with finance to determine what dollar figure in fines, lawsuits, settlements, and remediation costs would make the board, CEO, or owner say “turn off the lights and go home, we’re done.” Then use resources like the Cost of a Data Breach Report to show what incidents in your industry actually cost. It’s not about being dramatic—it’s about quantified risk metrics. This is hard work. Honestly, it’s complex enough that I might write about it separately in the future, because doing it well requires navigating a lot of variables and avoiding arbitrary numbers that don’t actually reflect your organization’s reality. But even an imperfect attempt at quantification is better than “this seems important” when you’re trying to justify security spending. You’re excluded from architecture decisions until implementation is done. If security review happens at the end, when changing anything is expensive and disruptive, you’re set up to either rubber-stamp bad decisions or be the bad guy who delays projects. What you can do: Insert yourself earlier, even informally. Build relationships with architects and senior engineers. Offer to review designs before implementation starts—position it as “I can help you avoid expensive changes later” not “I need to approve this.” Ask project managers or business owners if you can sit in on planning meetings—frame it as learning about upcoming initiatives so you can be more helpful. If there’s a regular architecture review or project kickoff process, work with whoever runs those meetings to get security included as standard practice. Eventually people will start including you because it’s easier than retrofitting security afterward. Leadership treats security purely as a compliance checkbox. If the only security work that gets prioritized is what’s required for audit, the organization doesn’t actually care about security—they care about compliance theater. What you can do: Use compliance as your lever, but be strategic about it. If MFA is “nice to have” for security but “required for SOC 2,” frame it as compliance. It’s not ideal, but it gets things done. But go further: monitor upcoming regulatory changes. Follow requests for comment on new regulations in your industry. Track proposed updates to audit frameworks. When you see something coming, bring it to leadership early: “This isn’t required yet, but it’s proposed for next year. If we implement now, we’re ahead of the mandate and we avoid a scramble when it becomes required.” You’re still using compliance as the hook, but you’re positioning security as forward-thinking business enablement rather than reactive checkbox completion. And sometimes demonstrating that kind of strategic awareness opens doors for broader security initiatives that aren’t tied to specific compliance requirements. (We’ll dig deeper into the compliance-versus-security tension in Week 9, including how to navigate organizations that treat compliance as the definition of security. For now, just understand that compliance can be a useful forcing function even when it’s not the same as actual security.) You’re expected to rubber-stamp decisions that are already made. If your opinion is requested but not actually valued, if you’re there to provide cover rather than actually influence outcomes, that’s not a real security role. What you can do: Stop rubber-stamping, but don’t become a blocker either. When presented with a done deal, respond with: “OK, you can proceed. I’ll work on the formal risk assessment and get that to you—it might take me a few days to review this properly and understand all the implications, I might need more insight from the team. But you’re not blocked waiting on me.” Then deliver an actual assessment. Document the risks you see, your recommendations, what controls might mitigate the concerns. Even though the decision is already made, you’re establishing a record and demonstrating that security review means something. Over time, people start to realize they’d rather have those conversations up front than get a risk assessment afterward that highlights everything they should have done differently. You’re not changing the immediate decision, but you’re creating incentive for earlier involvement next time. Nobody wants to hear about risks that don’t have cheap, easy solutions. Security work involves hard trade-offs. If leadership only wants to address risks that are trivial to fix, they’re not serious about managing actual risk. What you can do: Bring options, not just problems. “Here’s the risk. Here’s the ideal solution. Here’s what we could do with half the budget. Here’s what we could do with no budget but more time. Here’s what happens if we do nothing.” And remember: the “ideal solution” isn’t always Fort Knox-level security. (Go back to Week 2 if you need a reminder—perfection is often anti-security.) Sometimes the ideal solution is the one that actually gets implemented and used, even if it’s not theoretically perfect. Make it easier for them to say yes to something realistic rather than forcing them to choose between perfect and nothing. Some of these dynamics shift with patience and credibility-building. If you demonstrate value over time, if you build relationships, if you show that you’re helping solve problems rather than just identifying them, organizational culture can evolve. But be honest with yourself: are you making progress, or are you just spinning your wheels? If you’ve been trying these approaches for a year or more and nothing is changing, that tells you something about whether the organization is actually ready to invest in security. Building From Where You Are If the situation is workable—not ideal, but workable—here’s how to make progress: Start with foundational hygiene. Asset inventory, patch management, MFA, basic logging. Unglamorous work, but essential—and hard to argue against. These are things that benefit IT broadly, not just security. Build relationships across IT. The network team, the sysadmins, the database folks, the developers, the helpdesk staff who see phishing attempts first. They’re your early warning system and your implementation partners. You need them on your side. Invest in your business acumen. Understand how your organization makes money. What the critical business processes are. Who the key stakeholders are. What the competitive pressures and strategic priorities look like. Security people who understand the business get taken seriously. Security people who only understand security get sidelined. Find executive sponsors. Maybe it’s not your direct boss, but there’s someone in leadership who gets it. The CFO who’s concerned about financial risk. The COO who’s worried about operational disruption. The General Counsel who understands regulatory exposure. Build relationships with people who have organizational authority and who understand why security matters. They can be allies when you need support for initiatives your direct chain of command doesn’t prioritize. Know when to move forward. You won’t get every resource you ask for. You won’t address every risk you identify. Leadership will make decisions you disagree with. Document your position in your risk register. The risk, your recommendation, the decision not to address it, who made that decision, and when. Make sure leadership understands what they’re accepting. Then focus on the things you can influence. Continuing to fight decisions that have been made just burns credibility and energy. Position for growth. Whether that’s security eventually getting its own reporting line, or you moving to an organization with more mature security structure, or the function evolving as the company grows—organizations and roles change over time. Do good work. Build skills. Develop organizational literacy and political awareness. These are valuable regardless of where your career goes next. And here’s something worth remembering: there will be challenging days, weeks, even months in security work. But there are also those days that make being in security entirely worth it—when you prevent something bad from happening, when leadership finally understands what you’ve been saying, when a process you built actually works in a crisis. Those days erase a lot of the troubling times. They’re worth sticking around for. The Long Game This is not a quick fix situation. Building security while reporting through non-security leadership takes time, patience, political skill, and the ability to accept imperfect progress. You’re playing a long game. Incremental wins. Relationship building. Demonstrating value. Building credibility so that when you need support, you have it. Some people find this frustrating. They want to come in, fix everything, build the perfect security program. That’s not realistic in this structure. Other people find it satisfying in its own way. Taking an organization from “security is an afterthought” to “we have solid foundations and we’re continuously improving”—that’s real accomplishment, even if it’s not perfect. Know which type of situation you’re in. Know what’s actually possible given your organizational constraints. And decide whether that’s acceptable to you or whether you need to be somewhere with different structure. There’s no shame in concluding that a particular organizational structure doesn’t work for you. But give it a fair chance first. A lot of good security work gets done by people who don’t have perfect organizational positioning but who figure out how to operate effectively anyway. Practical Takeaways Understand the fundamental tension between IT delivery focus and security risk focus. It’s structural, not personal. Your job isn’t to say no—it’s to articulate risk, provide options, and help leadership make informed decisions. Translate technical risk into business impact. Speak in terms leadership actually cares about: revenue, compliance, operational stability, customer trust. Build credibility through small wins before asking for major resources. Prove you understand the business. Document risks and decisions in your risk register. Not for CYA, but to create a factual record of organizational risk acceptance. Align security work with IT’s goals where possible. Be an enabler, not just a blocker. Don’t let IT speak for security without consulting you first. Make yourself available and be clear about your actual positions. Bring options, not just problems. Multiple approaches at different cost/effort levels make it easier for leadership to say yes to something. Understand why decisions go against you before deciding how to respond. Context matters—budget, resources, competing priorities, or clarity of presentation. Build relationships across IT and with executive sponsors outside your direct chain. Monitor upcoming regulatory changes proactively. Position security as forward-thinking business enablement, not reactive compliance. Know when to move forward. Document decisions, then focus energy on things you can influence. Accept that there’s a ceiling to what you can accomplish in this structure, but work toward organizational evolution. There will be challenging times, but also moments that make security work entirely worthwhile. Those days are worth sticking around for. The post Week 7: Reporting to IT: How to Build Security When You’re Not in Charge appeared first on Cultivating Security.
-
12
Week 6: Vendor Relationships Aren’t Partnerships (No Matter What the Sales Deck Says)
Every vendor will tell you they’re committed to security. They take it very seriously. They’re a trusted partner in your security journey. They understand your challenges and they’re here to help. None of this means anything. I’m not saying vendors are malicious. Most aren’t. But they’re businesses with business objectives, and those objectives aren’t perfectly aligned with yours. They want to sell you products, expand their footprint in your organization, and minimize their liability. You want tools that work, reliable service, and transparency when things go wrong. Those aren’t the same thing. Understanding this dynamic—and adjusting your expectations and processes accordingly—is critical. Because you’re going to rely on vendors for critical infrastructure and applications, and if you approach those relationships with naive trust instead of informed skepticism, you’re going to get burned. The Marketing vs. Reality Gap Vendor marketing materials are aspirational, not descriptive. “Enterprise-grade security.” What does that actually mean? Depends on the vendor. Might mean they encrypt data at rest. Might mean they have SOC 2. Might mean they did a penetration test once. It’s a phrase that sounds impressive and commits them to nothing specific. “Comprehensive audit logging.” We talked about this in the visibility post. Comprehensive to them might mean they log authentication. You need data access events? That’s a different tier. You need API call logs? That’s custom pricing. You want real-time log forwarding to your SIEM? Oh, they don’t support that. But there’s an Excel export you can run manually. Once a day. During business hours. So much for your real-time security monitoring. The gap between what the marketing promises and what the product actually delivers can be significant. “24/7 support.” Sure, you can open a ticket anytime. Whether anyone actually looks at it in a timely fashion is a different question. And here’s where you need to read the SLA carefully—because “24/7 support” might mean they’ll acknowledge your ticket within 24 hours, not that they’ll actually start working on it. Acknowledgment and resolution are very different things, but the sales deck doesn’t make that distinction. (We’ll get into SLA definitions more in a minute—they’re worth understanding before you need them.) “Seamless integration.” It integrates with your environment using documented APIs and standard protocols. Whether that integration actually works smoothly, whether it requires custom scripting, whether it breaks when either system updates—that’s the reality you discover after purchase. The sales process is about showing you the best-case scenario. Your job is to figure out what the realistic scenario looks like. Due Diligence That Actually Matters Most vendor security assessments are theater. You send them a questionnaire with 200 questions about their security practices. They send back answers that were written once and are used for every customer. “Yes, we encrypt data in transit and at rest.” “Yes, we perform regular penetration testing.” “Yes, we have an incident response plan.” None of this tells you anything useful. And here’s another problem: vendors redefine common industry terms to mean whatever their product actually does. They say ‘multifactor authentication.’ You’re thinking MFA app, hardware token, something actually secure. They mean username and password, plus we’ll email you a code—or worse, SMS. Email is explicitly prohibited by NIST SP 800-63B-4 for out-of-band authentication. It doesn’t prove possession of a specific device and is too vulnerable to compromise. SMS codes are now classified as a ‘restricted authenticator’ under NIST SP 800-63B-4 (Section 3.2.9). Organizations can still use them, but only if they: offer alternatives, inform users of the risks, implement additional mitigations like monitoring for SIM swaps and device porting, and maintain a migration plan to move away from SMS over time. That’s the 2025 guidance—SMS is tolerated, not recommended, and organizations using it need to acknowledge they’re accepting known risks. You ask about supporting an authenticator app—maybe the one your entire organization already uses for everything else. “Oh, that’s on our roadmap. It’s too complex to implement right now.” Too complex. Too costly. But emailing codes? That they can do. That’s simple enough. They say “network segmentation.” You’re thinking proper isolation with stateful firewalls controlling traffic between segments. They mean separate subnets with no actual access control in between. Traffic flows freely; they just put different IP ranges in different VLANs and called it segmentation. And sometimes the sales guy is so non-technical—and the sales engineer assigned to help him is barely better—that neither of them can actually answer detailed questions without taking them back to the dev team. You ask about SAML attribute mapping or API rate limits or log retention policies, and you get “let me check with engineering and get back to you.” Which should tell you something about how well this is going to go when you need support. What you actually need to know: What access controls exist and how granular are they? Can you implement least privilege or is it all-or-nothing permissions? Can you restrict access by IP, by time of day, by specific actions? Or do you get “admin” and “user” and that’s it? What logging is actually available and what does it cost? Not what they claim to log—what can you actually export, in what format, with what latency, and is there a separate fee for it? Can you get the logs in real-time or is there a 24-hour delay? Can you export to your SIEM or do you have to use their analytics platform? What happens during an incident? Not what the glossy incident response plan says—what actually happens? How do you get notified? What information do they provide? How quickly? Are they transparent about root cause or do they give you vague reassurances? Do they help you determine if your data was exposed or do you have to figure that out yourself? And here’s a critical one: what does “immediate notification” mean in their contract? Because I’ve seen vendors interpret that as “we’ll notify you after the forensic investigation is complete”—which turns out to be three months after the breach, five months after your data was actually exposed and already circulating in the wild. Their lawyers and your understanding of “immediate” might be very different things. (The Marquis breach is a recent example of exactly this problem—customers finding out about compromised data through third parties rather than timely vendor notification. I wrote about that here.) How do they handle vulnerabilities? What’s their patching cadence? How do they notify customers? Do they provide advance notice before making changes that might affect you? If they have a significant security issue, when do you find out—before or after it’s public? (Hint: it’s likely after.) What are their subprocessors and where is your data going? They’re probably not running everything themselves. What third parties have access to your data? Where is it geographically? Do you have any control over that? Some vendors are transparent about this. Others you have to push to get answers. What’s the data retention and deletion policy? When you delete something, is it actually deleted or just hidden from your view? When you terminate service, how long until your data is actually removed from their systems? Can you verify deletion happened? And here’s one that almost nobody asks: what happens during their disaster recovery testing? Can your “deleted” data be accidentally restored from a DR backup during a test or actual recovery event—and then just sit there because someone forgot to re-delete it? You got a certificate of destruction when you terminated the contract. That should mean something, right? Except now your data is back in their environment because of a DR restore, and nobody on their ops team knows it’s supposed to be gone. Because that happens. Ask specific questions. Push for specific answers. “We take security seriously” isn’t an answer. SLAs Are Not What You Think They Are Service Level Agreements sound protective. The vendor commits to certain uptime, response times, support levels. If they don’t meet those commitments, you get… credits. Service credits don’t help you when your business-critical application is down and costing you revenue. They don’t help you when a security incident happens and you can’t get answers from the vendor. They don’t compensate you for reputational damage or regulatory penalties. Read the SLA carefully. The uptime commitment probably excludes “scheduled maintenance” (which might be announced with 24 hours notice at their discretion). The response time probably measures when they acknowledge your ticket, not when they actually resolve it. The support commitment probably has exclusions for issues caused by “customer misconfiguration or misuse” (defined however they want). And here’s a question: can you even see the closure codes the support technician uses when they close your ticket? Because if they mark it as “customer error” or “working as designed,” that ticket doesn’t count against their SLA—even if the issue was absolutely on their side. You might not even know they blamed you for it unless you specifically ask for the closure reason. And good luck disputing it after the fact. And the remedies are usually capped at a small percentage of your monthly fees. So if they have a major outage that costs you significant money, your remedy is getting back a fraction of what you paid them that month. I’m not saying SLAs are worthless. But they’re not protection. They’re a contract term that establishes minimum expectations and limited remedies. Treat them accordingly. The Support Reality Vendor support quality varies wildly, and you often don’t know what you’re getting until you need it. Tier 1 support is usually reading from scripts. “Have you tried turning it off and on again?” “Have you checked the documentation?” They’re there to handle common issues and escalate everything else. If you have a complex technical question or a security concern, you’re going to spend time with tier 1 before you get to someone who can actually help. Higher tiers of support usually require higher-priced plans. “Enterprise support” costs more. “Priority escalation” costs more. “Dedicated support engineer” definitely costs more. And what does “dedicated” actually mean? Are they supporting just you? Or are they “dedicated” to five customers? Ten? Fifteen? With the hope that not all of you call at the same time? The vendor’s definition and your expectation might not align. Is it worth it? Depends on how critical the service is and how much risk you’re willing to accept. If the vendor going down means your business stops, you probably need the expensive support tier. If it’s a nice-to-have tool, maybe you can live with slower response. But understand that support quality is variable. Sometimes you get someone who really knows the product and can help you. Sometimes you get someone who’s reading documentation you could have read yourself. This is true of every vendor—even the expensive enterprise ones. Here’s something practical: as your team engages with vendor support, document the questions they ask and the exact process they follow. You’ll quickly discover they’re reading from a script. That’s not a criticism of the support person—that’s literally their job. So learn the script. Know what they’re required to ask. Have the answers ready. Understand what the escalation triggers are. I’ve seen this turn a one-hour initial troubleshooting call into 20 minutes, because I already had the information they needed to check their boxes and move to the next step. It saves time. It saves sanity. And it gets you to someone who can actually help faster. The Update and Change Problem Vendors are going to update their products. New features, bug fixes, security patches. You generally want this—products should improve over time. But vendors update on their schedule, not yours. And sometimes those updates break things. SaaS products are particularly tricky here. You don’t control when updates happen. The vendor pushes an update, and now you’re running a new version whether you wanted to or not. If that breaks an integration or changes behavior you depended on, too bad. And depending on the vendor’s DevOps maturity, “updates” might mean a developer created a feature request, pulled the code, edited it, maybe ran some automated tests, and pushed it straight to production. Problem solved for the one customer who asked for it. Problem created for two others who depended on the old behavior. That’s not a theoretical concern—that’s reality at immature organizations masquerading as modern SaaS providers. Responsible vendors give advance notice. “We’re changing this API endpoint on this date.” “This feature is being deprecated in 90 days.” That gives you time to adapt. But here’s the problem: who’s actually getting those notifications? Is it going to a shared mailbox nobody monitors? Is it going to the person who set up the integration three years ago and doesn’t work there anymore? And even if the right person gets it, do they understand what they’re reading? Do they comprehend the downstream impact? Change notifications aren’t business-as-usual “thanks for the FYI” emails. You need to make sure someone on your side—ideally the business owner of that integration—actually reads these things and understands whether it’s “small change, thanks for the heads up” or “oh crap, this is going to break our critical integration and we have 90 days to fix it.” Less responsible vendors just make changes. You find out when something stops working. And then you’re scrambling to figure out what changed and how to fix it. There’s not much you can do about this except build in resilience. Don’t build critical processes that depend on undocumented vendor behavior. Have fallbacks where possible. Monitor for changes. Accept that you’re not in control of the release schedule. Vendor Lock-In Is Real Once you’re using a vendor’s platform, switching is expensive. Data migration is hard. Especially if the vendor doesn’t make it easy to export in a usable format. Especially if you’ve accumulated years of data. Especially if the new vendor’s import process is fragile or lossy. Integration work has to be redone. All those connections to other systems, all that custom scripting, all that workflow configuration—none of it carries over to a different platform. Staff knowledge doesn’t transfer. Your team has learned this platform. Moving to a different one means retraining. Means hiring people with different skills or waiting for your current staff to ramp up. Means reduced productivity during the transition. This isn’t an accident. Vendors know that switching costs keep customers around even when they’re unhappy. So they design products that integrate deeply, that store data in proprietary formats, that require specialized knowledge. I’m not saying you should avoid vendor products or refuse to integrate deeply. But go in knowing that you’re making a long-term commitment. Switching later will be painful and expensive. Make sure you’re choosing vendors you can actually live with for a while. The Breach Notification Question When a vendor gets breached, you need to know. Promptly. With enough detail to assess your risk. What you actually get depends on the vendor. Good vendors will notify you quickly—often before the breach is public. They’ll tell you what happened, what data was potentially exposed, what they’re doing about it. They’ll be transparent about the timeline and the scope. Less good vendors will wait until they’re legally required to notify, hope it doesn’t make the news, and send you a vague letter about “a security incident” without much detail. You’re left trying to figure out if your data was actually affected and what you should do about it. And some vendors will fight disclosure entirely. Claim it wasn’t actually a breach. Claim no customer data was accessed (even if systems that contained customer data were compromised). Minimize the severity. Only acknowledge what they have to. This is one of those things you can’t really evaluate during the sales process. But you can include notification requirements in your contract. “Vendor shall notify Customer within 24 hours of discovering a security incident that may affect Customer data.” Pay attention to the language here. “Shall” and “must” are binding obligations. “Will” is generally binding but slightly weaker. “Should” is advisory—nice to have, but not enforceable. Your vendor’s lawyers know this. You should too. And if this is a mission-critical application where downtime costs you revenue, or where a breach would trigger regulatory penalties, don’t just hand this language to your legal team in a vacuum. Give them context. Explain why prompt notification matters for this specific vendor relationship. They can’t negotiate effectively if they don’t understand what you’re actually trying to protect. It’s not perfect, but it’s better than nothing. The Shared Responsibility Trap Cloud vendors especially love to talk about shared responsibility. They’re responsible for security “of” the cloud—the infrastructure. You’re responsible for security “in” the cloud—your configurations and data. This is technically true. But it’s often used to deflect blame. There’s a misconfiguration that exposes your data. “That’s your responsibility—we provide the controls, you have to use them correctly.” There’s a vulnerability in the platform. “We patched it, but you didn’t enable the security feature that would have mitigated it.” Sometimes this is fair. If you leave an S3 bucket publicly accessible because you didn’t configure permissions correctly, that’s on you. The controls existed, you just didn’t use them. Sometimes it’s not fair. If the default configuration is insecure and you have to know to change it, that’s poor design. If the security controls are buried in obscure settings, that’s a UX failure. If the documentation doesn’t clearly explain the risks, that’s inadequate disclosure. The point is: shared responsibility means you need to understand your part. You can’t just trust that the vendor has secured everything. You have to configure things correctly, enable available security features, understand what your responsibilities actually are. And when something goes wrong, “shared responsibility” might mean the vendor denies any fault and you’re left holding the bag. What You Can Demand (And What You Can’t) You have more leverage than you think during negotiations, but less than you’d like after you’ve signed. Before you sign, you can negotiate contract terms. Data location requirements. Security requirements. Audit rights. Notification timelines. Indemnification. Data deletion procedures. Not every vendor will agree to everything, but many will agree to some things, especially for larger contracts. Watch out for vendors who subtly change words during negotiation. You propose language with “shall.” The redline comes back with “should.” You ask for “will notify within 24 hours.” They counter with “will make reasonable efforts to notify.” This isn’t accidental. These word changes fundamentally alter who’s responsible and what’s enforceable. If you let them slide, you’ve just given up your leverage. And here’s a major red flag: you ask to include something specific in the contract, and the vendor says “oh, we don’t need to put that in writing—that’s just our standard practice” or “that’s basic customer service, we always do that.” No. If it’s important enough for you to ask for, it’s important enough to write down. If they actually do it as standard practice, then writing it down shouldn’t be a problem. The fact that they’re resisting putting it in the contract tells you they want the flexibility to not do it when it’s inconvenient. Get it in writing. After you sign, your leverage is limited. You can escalate issues. You can threaten to leave (which they know is expensive and time-consuming for you). You can leave negative reviews or tell your peers. But fundamentally, the contract you signed is what you have to work with. So get it right up front. Don’t assume you can fix contract problems later. Don’t accept vague language hoping you can interpret it favorably. Get specific commitments on the things that matter to you. And know your deal-breakers. If a vendor won’t commit to reasonable security practices or transparency, that tells you something. Maybe they’re worth the risk anyway because they’re the only option for what you need. But at least you know what you’re getting into. The Realistic Approach You’re going to use vendor products. You don’t have a choice. Building everything in-house isn’t realistic for most organizations. And even if you wanted to, buying software to install on your own infrastructure is rapidly disappearing. SaaS is the model now. The goal isn’t to avoid vendors. The goal is to manage vendor risk intelligently. A note on scope: These patterns apply whether you’re buying SaaS tools, outsourcing SOC functions, or working with managed service providers. The incentive misalignment is structural. A vendor selling you a SIEM, an MDR provider monitoring your environment, a penetration testing firm doing annual assessments, a managed IT services company handling your infrastructure—they’re all businesses with business objectives that aren’t perfectly aligned with yours. The specifics vary, but the underlying dynamic is the same. Understand it and adjust your approach accordingly. Do real due diligence. Ask specific questions. Test answers. Talk to existing customers if possible—and don’t just talk to their hand-picked reference accounts. Find the customers they don’t want to put in front of you. Those conversations will tell you what actually happens when things go wrong. Don’t just trust the marketing. Get security and privacy commitments in the contract. Don’t rely on verbal assurances or marketing materials. Plan for failure. What happens if the vendor has an outage? What happens if they get breached? What happens if they go out of business? Have answers. Monitor vendor security posture over time. Things change. A vendor that was secure last year might have degraded. A vendor with good practices might have been acquired by someone with worse practices. Document everything. Issues, concerns, failures, support response times, outages, security incidents. Keep that documentation. When it comes time to renew the contract, this is valuable leverage. You can push for improved SLAs, better service commitments, or reduced pricing based on documented performance problems. Without documentation, you’re negotiating from memory. With it, you have facts. Try to build relationships with vendor security and support teams where possible. For smaller vendors or if you’re a significant customer, you might actually get access to people who know your environment. When you need help, having a contact who remembers your setup makes a difference. But be realistic: for most SaaS vendors, especially the larger ones, you’re not getting that. You’re getting ticketing systems, rotating support staff, and generic email addresses. The person who helped you last month doesn’t work that account anymore. The “dedicated” account manager changes every year. So focus on what you can control: document your environment thoroughly, keep records of your configurations and integrations, and build internal knowledge so you’re not dependent on vendor institutional memory that doesn’t exist. And maintain healthy skepticism. Vendors aren’t enemies, but they’re not your partners in the sense of having aligned interests. They’re businesses selling you products and services. Their ultimate goal is to extract money from your company while minimizing their own costs. If they’re publicly traded, that pressure is explicit—maximize revenue, minimize expenses, hit quarterly numbers. If they’re a smaller SaaS startup whose founders are aiming for acquisition, the goal is highest possible valuation at lowest possible cost. Either way, “investing in security infrastructure” or “building robust support capabilities” competes directly with profit margins and growth metrics. Understanding these incentives helps explain a lot of vendor behavior. It’s not personal. It’s structural. Treat them accordingly—professionally, but without illusions about the nature of the relationship. Practical Takeaways Marketing language means nothing. Push for specific, technical answers about capabilities and limitations. Due diligence should focus on access controls, logging, incident response, and subprocessor relationships—not generic questionnaires. SLAs establish minimum expectations and limited remedies. They don’t actually protect you from significant harm. Support quality varies. Evaluate what tier you actually need based on how critical the service is. Vendor lock-in is real and intentional. Choose vendors knowing you’re making a long-term commitment. Shared responsibility means you need to understand and fulfill your part. The vendor won’t do it for you. Negotiate security requirements before signing. You have much less leverage afterward. Plan for vendor failures, breaches, and business changes. Hope for the best, prepare for the realistic worst case. Vendors are businesses with business objectives. Your job is to get what you need from them while managing the inherent risks. The post Week 6: Vendor Relationships Aren’t Partnerships (No Matter What the Sales Deck Says) appeared first on Cultivating Security.
-
11
Week 5: The Identity Sprawl Problem
Identity used to be simple. Users had accounts. Accounts had passwords. You managed them in Active Directory or LDAP. Authentication happened at the perimeter, and once you were inside, you were mostly trusted. That model is completely inadequate for how organizations actually work now. Identity is the perimeter now. Users authenticate to dozens of different services. Applications authenticate to other applications. Service accounts, API keys, OAuth tokens, federated access, just-in-time provisioning—it’s not one directory with one authentication method anymore. It’s a sprawling mess of identity stores, authentication mechanisms, and access patterns that most organizations only partially understand. And this sprawl isn’t theoretical. It’s the attack surface that matters most. Credential theft, account compromise, privilege escalation, lateral movement—these are the techniques that actually succeed in breaches. When organizations get compromised, it’s usually because someone got access to credentials that gave them more than they should have had. So if identity is the real perimeter, you’d think we’d have it locked down. But most organizations are underwater on this, and many don’t even realize how bad it is. Why Identity Got Complicated It didn’t used to be this way. When infrastructure was mostly on-premises and applications were mostly internal, you had centralized identity management. Active Directory for Windows environments, LDAP for Unix. Users logged in once, got a ticket or token, and that authenticated them to internal resources. Service accounts existed but they were relatively few and you could keep track of them. Then SaaS happened. Now your users are authenticating to Salesforce, Office 365, Google Workspace, Slack, GitHub, AWS, Azure, and fifty other services. Some of these federate back to your directory. Some don’t. Some use SAML, some use OAuth, some use proprietary authentication mechanisms. Then cloud happened. Your applications now run in AWS or Azure or GCP, and they need to authenticate to cloud services. So you have IAM roles, service principals, managed identities. And your on-premises applications still need to talk to cloud services, so now you have hybrid identity scenarios. Then APIs became the primary way applications interact. Every integration is an API call, and every API call needs authentication. API keys, OAuth client credentials, service account tokens—they proliferate like weeds. Then CI/CD pipelines became critical infrastructure. Your deployment pipelines need access to repositories, artifact storage, cloud infrastructure, production systems. Those are credentials too, often highly privileged ones. The result is identity everywhere, in forms that don’t fit the centralized management model we used to have. The Service Account Problem User accounts are at least somewhat visible. You can see them in your directory. You can review them periodically. You can enforce MFA. You can detect anomalous authentication patterns. Service accounts are where things get messy. Applications need credentials to run. Batch jobs need credentials. Integrations need credentials. These aren’t interactive users—they’re automated processes. And they need access to data, to APIs, to infrastructure. How do you manage that? In a lot of organizations, the answer is “poorly.” Service accounts with passwords that never change. Hard-coded credentials in application config files. API keys stored in environment variables. Shared credentials used by multiple systems. Overprivileged access because it was easier to just give broad permissions than to figure out exactly what was needed. And then there’s the vendor problem. You’re installing some enterprise application, following the vendor’s deployment guide, and you get to the service account section. What permissions does it need? “Oh, just make it a Domain Admin—we’ve found that works best.” Wait. What? Yeah. Domain Admin. For a reporting tool. Or a backup agent. Or some middleware that touches three specific file shares. The vendor engineer on the call says it with a straight face, like this is perfectly reasonable. And if you push back, if you try to scope it down to actual required permissions, you’re told that’s “unsupported” or that they “can’t guarantee functionality” if you don’t follow their documented requirements. So now you’ve got a choice: deploy the thing your organization paid for according to vendor specs (with Domain Admin credentials sitting in a config file somewhere), or spend days reverse-engineering what it actually needs, knowing that when something breaks, vendor support will point right back to that non-standard configuration. And these credentials tend to be long-lived. A user password might get changed when someone leaves or every 90 days or when MFA gets enforced. A service account password might literally never change once it’s set up, because changing it means updating all the places that use it, and nobody’s entirely sure where all those places are. This is a massive security problem. If an attacker gets one of these credentials, they have persistent access that might not trigger any of your detection mechanisms because the activity looks like normal automated operations. The API Key Explosion APIs are how modern applications work. Your web application calls an API to authenticate users. Your mobile app calls an API to sync data. Your third-party integrations call APIs to exchange information. Every one of those API calls needs authentication. Sometimes it’s OAuth with relatively short-lived tokens. Sometimes it’s API keys that don’t expire. Where do those keys live? In configuration files. In environment variables. In secrets management systems (if you’re doing it right, but a lot of organizations aren’t). In developer laptops. In documentation. In Slack messages and email threads and wiki pages. And here’s a question you should ask but probably don’t want to know the answer to: are there even separate keys for dev and prod? You’d be surprised how often the answer is no. Or “technically yes, but we use the prod key in dev because the dev environment kept having issues” or “we copied prod to staging to troubleshoot something and never changed it back.” So now your production API credentials are sitting in three different environments with different security controls, different access policies, and different definitions of “who can SSH into this box.” They spread organically. Developer needs to integrate with a service, generates an API key, uses it. Works great. Key is now embedded in code or config. Gets committed to a repository. Gets copied to other environments. Gets shared with other team members who need to work on that integration. Six months later, nobody remembers that key exists. It’s still valid. It still has whatever permissions were granted initially. It’s still sitting in places nobody’s tracking. Rotate it? Maybe if you’re lucky it’s documented somewhere and someone remembers to include it in a rotation process. More likely it’s forgotten until it either breaks something or shows up in an incident investigation. And then there’s the vendor API problem—the other side of the coin from all those keys you’re managing internally. You’re evaluating a new SaaS product. It’s marketed as a “modern cloud platform” with “seamless integrations” and “API-first architecture.” Great. You need to integrate it with your existing systems. You ask about their API. They send you documentation. You open it up and find XML-based requests with username and password authentication embedded in the payload. Not OAuth. Not API keys with proper rotation support. Literally username and password in XML, transmitted with every request. In 2025. You ask about OAuth support or token-based authentication. “That’s on our roadmap,” they say. Or “our enterprise customers haven’t requested that.” Or my personal favorite: “our API is very secure—we support TLS encryption.” Yes. TLS is the bare minimum for transmitting anything over the internet. That’s not a security feature; that’s table stakes. It doesn’t make username-and-password-in-every-request a good authentication model. But this is a vendor your organization has already committed to. Contract’s signed. Budget’s allocated. Integration needs to happen. So now you’re building connectors to a “modern” platform using authentication patterns that were outdated a decade ago, and you get to explain to your auditors why you’ve got service credentials being transmitted with every API call instead of using token-based auth with proper expiration and rotation. The vendor will eventually update their API. Probably. In a few years. After enough customers complain. And then you’ll get to rewrite all your integrations to use the new endpoints, because they won’t maintain backward compatibility with the old authentication model forever. This is what “API-enabled” sometimes means in the vendor world OAuth and Federation (Better But Still Complicated) OAuth and SAML federation are massive improvements over hardcoded passwords and long-lived API keys. No question. Instead of giving every application its own credential store, you authenticate centrally and get tokens that prove your identity. Tokens expire, which limits the window of compromise. Tokens can have scopes that limit what they can access. You can revoke them centrally. This is better. But it’s not simple. You’ve got multiple identity providers. Your corporate IdP for employees. Social logins for customers. B2B federation for partners. Each one is a trust relationship that needs to be configured correctly. You’ve got token lifetimes to manage. Too short and you’re constantly re-authenticating users (bad user experience, and they’ll find ways around it). Too long and a compromised token has extended validity. You’ve got consent flows and scope management. What can this application actually access? Did the user consent to that? Does the application request appropriate scopes or does it ask for everything just in case? You’ve got refresh tokens, which are long-lived credentials that can be used to get new access tokens. If an attacker gets a refresh token, they can maintain access even after access tokens expire. Where are refresh tokens stored? How are they protected? Most people don’t have good answers. And then there’s the quality problem with SSO implementations themselves. You’d expect that if a SaaS vendor advertises SAML support, they’ve implemented the core spec requirements—including certificate rotation. Here’s what actually happens: I’ve seen major vendors hard-code SAML signing certificates in their implementations. Not as bugs—as design choices. (This reflects a pattern I’ve encountered multiple times across the industry with various vendors, large and small, startup and mature.) When the cert nears expiration and needs rotation, the only path forward is to completely tear down the SSO configuration and rebuild it from scratch. New metadata exchange. New attribute mapping. New testing. All your users re-provisioned. The vendor’s response when you escalate? Schedule a maintenance window for the rebuild. Certificate rotation isn’t an edge case. The OASIS SAML 2.0 Metadata specification explicitly allows multiple signing keys to be published simultaneously using KeyDescriptor elements to support planned certificate rollover. This enables relying parties to trust both the current and future signing certificates during a transition window—preventing outages and eliminating the need for destructive reconfiguration. This expectation also aligns with broader security guidance from NIST SP 800-57, which treats cryptographic key lifecycle management and rotation as foundational security hygiene, not optional enhancements. When a SaaS vendor implements SAML in a way that cannot tolerate routine certificate rotation without downtime or rebuilds, that’s not a limitation of SAML—it’s an implementation failure. And this wasn’t a quick fix. This was days of work—coordinating with the vendor, scheduling downtime, communicating to users, rebuilding configs, testing, hoping nothing broke. The kind of thing that makes your identity team lose faith in humanity for a solid month. This is what “SSO-enabled” sometimes means in practice. The vendor checks a box on their feature matrix, passes whatever minimal validation their sales team needs, and ships something that technically works but operationally fails the moment you need to do normal security hygiene. And you’ve got service-to-service OAuth flows (client credentials grant), which in practice can end up looking a lot like the API key problem—long-lived credentials that need to be managed and rotated and tracked. The Visibility Problem Here’s a question: how many service accounts exist in your environment right now? If you can answer that accurately within 10%, you’re ahead of most organizations. How many API keys are currently valid? Where are they stored? What do they have access to? When were they last used? How many OAuth applications are integrated with your identity provider? What scopes have been granted? Which ones are actively used versus which ones were set up for testing and forgotten? How many shared credentials exist—passwords or keys that multiple people or systems know? Most organizations genuinely don’t know. They have partial answers. The service accounts in Active Directory are documented (maybe). The API keys the platform team knows about are tracked (possibly). But the complete picture? That’s rare. And without visibility, you can’t manage the risk. You can’t rotate credentials you don’t know exist. You can’t enforce least privilege if you don’t know what access has been granted. You can’t detect anomalous usage if you don’t know what normal usage looks like. The Least Privilege Gap Every security framework says you should implement least privilege. Only grant the access that’s actually necessary. Review permissions regularly. Remove access that’s no longer needed. In practice, this is incredibly hard with sprawling identity. When you set up a new integration, do you carefully analyze exactly what permissions are required and grant only those? Or do you grant broader permissions to make sure it works, intending to narrow it down later, and then never actually getting around to the narrowing part? When someone changes roles, do you remove their old permissions and grant only the new ones? Or do they accumulate permissions over time because removal is risky (what if they still need that access for something?) and nobody wants to break things? And then there’s the cross-training excuse. “I need to keep my old access so I can train my replacement.” Okay, fair enough—for a while. But at what point is that training actually done? Two weeks? A month? Six months later when they still have full admin rights to systems they haven’t touched in half a year? The problem is that “training my replacement” becomes “I might need to help out occasionally” becomes “well, we never know when we’ll need someone who understands the old system” becomes permanent access that nobody questions because there’s always some theoretical justification. And nobody wants to be the person who removes access and then gets blamed when something breaks. So the permissions stay. They accumulate. The person who started in desktop support five years ago and moved through three different roles? They’ve probably still got local admin rights on workstations they haven’t touched in years, plus database access from when they helped with a migration, plus application admin from that temporary project assignment. When a service account is created, does it get exactly the minimum necessary permissions? Or does it get admin rights because figuring out the minimum is time-consuming and admin rights definitely work? The path of least resistance is almost always more permissive than it should be. And over time, that accumulates into significant over-privileging. This matters because privilege is what determines the impact of compromise. A stolen credential with read-only access to a single database is bad. A stolen credential with admin rights to your cloud environment is catastrophic. The Lifecycle Management Challenge Identity sprawl isn’t just about how many identities exist. It’s about how they’re managed over time. User accounts have a lifecycle: provisioned when someone joins, modified when they change roles, deprovisioned when they leave. Most organizations have at least basic processes for this (though they’re not always followed consistently). Except when the applications themselves make that impossible. You’ve got an application where audit logs are tied directly to user accounts. Not to user IDs that persist after account deletion—actually tied to the active account. Which means if you delete the user, you delete all their audit records. Now you’re in a bind. Someone leaves the organization. Policy says terminate their access immediately. Compliance says retain audit records for seven years. The application developer made a terrible design decision years ago, and now you’re stuck with it. So what do you do? You disable the account but can’t delete it. It sits there—disabled but present—for years. Multiplied across dozens of applications with similar problems, you end up with hundreds or thousands of disabled accounts that you can’t fully remove because some application somewhere has tied critical data to their continued existence. And this isn’t even your security team’s fault. You’re living with technical debt created by a software vendor who didn’t think about identity lifecycle management when they built the application. But it’s your attack surface now. Service accounts? API keys? OAuth applications? The lifecycle is often “created once, exists forever.” Nobody deprovisions a service account when the application that used it is retired, because nobody remembers that the service account exists. Nobody rotates an API key when the project that needed it is completed, because the key is embedded somewhere and nobody’s sure where. Nobody reviews OAuth application permissions to see if they’re still appropriate, because that’s not part of anyone’s job and there’s no process for it. So you end up with identity debt. Credentials that exist but shouldn’t. Access that was granted but isn’t needed anymore. Trust relationships that made sense three years ago but the business context has completely changed. And this debt accumulates, creating an ever-expanding attack surface. What Good Looks Like (It’s Still Hard) Even mature organizations struggle with this. But they have some things in place that make it manageable: They have inventory of service accounts and non-human identities. Not perfect, but maintained well enough that they know what exists and can review it periodically. They have secrets management infrastructure. API keys and credentials aren’t stored in code or config files—they’re in a secrets vault with access controls and audit logging. They enforce credential rotation. Automated where possible, tracked when manual intervention is required. Not perfect, but happening regularly enough that credentials don’t live forever. They use short-lived credentials where feasible. Tokens that expire. Temporary access grants. Just-in-time elevation for administrative tasks. They have processes for access reviews. Not just user access—service account permissions, API key scopes, OAuth application grants. Regular reviews to identify and remove access that’s no longer needed. They monitor authentication patterns. Anomalous service account usage. API calls from unexpected locations. Token usage outside normal patterns. This helps detect compromised credentials even if prevention wasn’t perfect. And they accept that this is ongoing work. Identity sprawl doesn’t get “fixed”—it gets managed continuously. Starting From Where You Are If you’re in an organization with poor identity management (and most are), you can’t fix everything at once. Start with inventory. You need to know what exists before you can manage it. User accounts, service accounts, API keys, OAuth applications. This is tedious work but it’s foundational. Implement secrets management for new credentials. Don’t try to retrofit everything immediately, but stop making the problem worse. New API keys go in a vault. New service account passwords are managed properly. Enforce MFA for (human) user accounts. This is lower-hanging fruit than fixing all the service account problems, and it significantly reduces the risk of user account compromise. Identify your most privileged credentials and protect those first. The service account with admin access to production databases. The API key that can modify cloud infrastructure. The OAuth application with broad scopes across critical systems. Make sure those are rotated, monitored, and properly secured even if you can’t do that for everything yet. Build processes for deprovisioning. When a user leaves, their account gets disabled. When an application is retired, its service accounts get disabled. When a project ends, its API keys get revoked. This requires discipline and organizational process, but it stops identity debt from accumulating as quickly. And document what you can’t see. Be explicit about the blind spots. “We have inventory of service accounts in these systems but not in those systems.” “We know API keys exist in these applications but can’t enumerate them without vendor cooperation.” “We can track OAuth grants in our corporate IdP but not in the SaaS applications that use social login.” This goes in your risk register, by the way. Not as something you’re okay with—as a documented gap with known risk that you’re working to address. When you eventually have an incident involving one of these blind spots, you want to be able to show that you identified the problem, escalated the risk, and were working within resource constraints. Not that you were blindsided because you never looked. The Cultural Problem The technical solutions for identity management exist. Secrets management tools, identity governance platforms, automated provisioning and deprovisioning, privileged access management systems—the tooling is available. It’s also expensive and often highly complex. Identity governance platforms aren’t cheap, and some of them require dedicated staff just to maintain the thing. But let’s say you get budget approval and buy one of these systems. You’ve solved the problem, right? Not even close. I’ve seen an organization with a top-tier privileged access management system—premium licensing, all the features, the works. Four years into their contract. I asked to see their secrets inventory. Twenty-five secrets. Total. Twenty-five secrets. Two hundred employees. Ten IT staff. Four years of licensing costs. Nobody was using it. Why? Because someone—probably with good intentions—built a workflow that required you to enter a change ticket number to retrieve a secret. The system didn’t validate it or correlate it to their actual ticketing system. It was just a mandatory field. A speed bump that added friction without adding value. So the IT team, including the IT manager, just… worked around it. Credentials went back into spreadsheets, into config files, into the same places they’d always been. The expensive tool sat there, mostly empty, generating reports that nobody read about the few test credentials someone had bothered to load during implementation. They were paying for a solution they weren’t using because someone made it too painful to use correctly. This is the real problem with identity management tooling. It’s not that the tools don’t exist. It’s that implementing them in a way people will actually use—without creating so much friction that everyone routes around them—is harder than anyone wants to admit. The harder problem is organizational. Getting developers to use secrets management instead of hardcoding credentials. Getting operations teams to rotate service account passwords. Getting business units to actually review and approve access when asked. Getting leadership to fund the tooling and the staff time required to implement it properly. Identity sprawl is partly a technical problem and partly a process problem and partly a cultural problem. You can’t fix it just by buying tools. You need organizational buy-in, process discipline, and ongoing attention. That takes time. And it takes making the case that this actually matters—that identity is the attack surface that needs the most attention, and that managing it properly is worth the investment. Practical Takeaways Identity is the real perimeter in modern environments. Credential compromise is how most breaches happen. Service accounts and API keys are less visible than user accounts but often more dangerous. They’re over-privileged, long-lived, and poorly tracked. Inventory is foundational. You can’t manage what you don’t know exists. Start with knowing what identities and credentials are out there. Secrets management isn’t optional for new credentials. Stop making the problem worse even if you can’t immediately fix the existing mess. Credential rotation reduces the window of compromise. Automate where possible, enforce where automation isn’t feasible. MFA for (human) user accounts is lower-hanging fruit than fixing service account problems. Do the easier thing that still has significant impact. Access reviews need to include non-human identities. Service accounts, API keys, OAuth applications—all of it needs periodic review. Monitor authentication patterns for anomalies. Detect compromised credentials even if you can’t prevent them perfectly. Identity sprawl is managed continuously, not fixed once. This is ongoing operational work, not a project. The post Week 5: The Identity Sprawl Problem appeared first on Cultivating Security.
-
10
Why Chat-Based AI Tools Fail in Operational Security: Building Capability vs. Productivity
AI as Capability, Not Conversation: Why Chat-Based Tools Fail Operational Security Work In the last 18 months, every vendor has suddenly “integrated AI” into their products. Your SIEM has AI now. Your ticketing system has AI. Your monitoring platform has AI. I’ve even seen job schedulers get rebranded with AI features—automation that’s been running for years, now with a fresh coat of marketing paint. But here’s what’s interesting: most of these vendors won’t tell you which model they’re using. They won’t tell you how it’s trained. They won’t tell you what the prompts look like or how decisions actually get made. It’s just “AI-powered”—a checkbox on the feature list, not a technical specification. Stanford’s Foundation Model Transparency Index (https://crfm.stanford.edu/fmti/) found that companies are “most opaque” about training data and compute—the exact details you’d need to evaluate whether their AI actually fits your use case. The average transparency score dropped from 58 to 40 between 2024 and 2025. That’s the wrong direction. Some of this is legitimate. Machine learning has been in security tools for years, and calling it AI now isn’t entirely dishonest. Some of it is basic automation with better PR. The cynical part of me wonders how many “AI features” are just well-tuned regex with a GPT wrapper for the demo. Reuters recently covered what they’re calling “AI washing”—overstating AI capabilities in marketing claims (https://www.reuters.com/legal/legalindustry/ai-washing-regulatory-private-actions-stop-overstating-claims-2025-05-30/). From what I see in vendor pitches, that feels about right. But that’s not really the problem. The problem is that AI is on everyone’s tongue, but almost nobody can answer the next question clearly: What are you actually using it for? Not “do you have it”—what work is it doing? What decisions is it making? What outcomes is it improving? And critically: how do you know it’s working? That’s the question I had to ask my own leadership recently. Not because I’m skeptical of AI—I use it every day, multiple times a day. I’ve built AI-integrated components. Parts of this article were drafted with AI assistance. If you’re listening to the audio version, that’s AI text-to-speech. I’ve seen AI accelerate work in ways that let me run laps around my old self. I’ve also seen it fail spectacularly when used carelessly—when people treat it like Google instead of a reasoning tool, or when they trust confidence over correctness. Here’s a concrete example: AI will confidently give you what it thinks a YAML standard should look like based on common patterns, rather than what a specific developer’s implementation actually expects. It’ll pull from an article published two years ago when you know there have been three major updates since then. It guesses instead of searching. It synthesizes instead of retrieving. Sometimes that’s useful. Sometimes it wastes hours of your time chasing answers that were plausible but wrong. You learn to recognize those failure modes with experience. Most people on your team haven’t built that instinct yet. So when I asked “what are we doing with AI,” I wasn’t asking whether we should use it. I was asking whether we were building capability or just licensing productivity tools and hoping they’d scale to operational work. Those aren’t the same thing. And for teams doing security, compliance, risk, or audit work, the difference matters a lot. The Mismatch Between Productivity and Capability Let’s be clear about what tools like Microsoft Copilot are actually good at. They’re excellent for knowledge work augmentation: drafting emails, summarizing documents, explaining code, ad-hoc Q&A inside your productivity suite. That’s valuable. For individual contributors doing unstructured creative or analytical work, chat-based AI can genuinely accelerate output. I’ve seen this work. I was writing a performance review recently and needed to recall some specifics from about 10 months back. I asked Copilot for the concept—not exact keywords—and it found the email thread. Surfaced the right conversation, I re-read it, and it helped me fine-tune the review. That’s legitimately useful. That’s semantic search doing what it’s supposed to do. I also tried going back three years for something else. The emails were there—our retention policy goes back that far—but Copilot couldn’t find what I was looking for. That’s fine. Not a dealbreaker. Just a reminder that there are boundaries to what the tool can reliably do, even in its core use case. For individual productivity work, that’s more than acceptable. But operational security work doesn’t look like that. SOC analysis requires structured, repeatable outputs. Risk classification needs to be consistent across analysts. Incident enrichment needs to follow known patterns. GRC assessments need to be auditable. Compliance documentation needs to be defensible. And critically—the context of your environment and your organization’s risk tolerance need to be baked into the analysis, not reinvented every time someone asks a question. Chat-based AI assumes every user will build their own mental model, experiment, iterate, and interpret results on their own. That works fine when you’re brainstorming or learning. It breaks down when the work requires consistency, when variance equals risk, and when someone might audit your decision-making process six months later. Here’s the core problem: Copilot optimizes for individual productivity. Operational work requires institutional capability. Perfect angle. That’s the downstream cost nobody thinks about. Let me add it: For my team, that’s not a subtle distinction. We can’t have five analysts producing five different risk classifications for the same control gap because they each prompted the AI differently. We can’t have audit findings that depend on who happened to run the analysis that day. We can’t have incident summaries that vary wildly in quality based on someone’s skill at follow-up questions. And here’s what that variance costs in practice: a single word in a prompt can mean the difference between ten hours of remediation work and zero. If one analyst’s prompt leads to “this is a critical gap,” IT starts emergency patching, change requests get escalated, projects get delayed. If another analyst’s prompt—asking about the same control in the same environment—leads to “this is acceptable given your current posture,” nothing happens. Same control. Same environment. Different prompt. Completely different operational outcome. That’s not a quality-of-life issue. That’s an organizational efficiency problem. And when you’re working in a regulated environment where audit findings trigger mandatory remediation timelines, the stakes get even higher. You can’t defend “well, it depends on who ran the analysis” to an auditor or your executive team. Here’s a concrete example of how much variance lives in the prompting itself. I just tested this with Copilot in an unauthenticated browser session. Same control, two different prompts. First prompt: “What if we don’t implement the CIS Control ‘Microsoft network client: Digitally sign communications (always)’ is set to ‘Enabled’? For context, our environment is segmented, we have a robust defense in depth program, we run Windows Defender and have all but 3 ASR rules set to enforce, we have active SIEM logging from endpoints and firewalls, and our network is segmented with strong firewall rules in place.” Copilot’s answer: “In short: your layered defenses already reduce the risk, but not enabling this control leaves a gap in SMB integrity. It’s not catastrophic in your setup, but enabling it would close off a potential lateral movement vector with minimal downside unless legacy compatibility is an issue.” Second prompt: “What if we don’t implement the CIS Control ‘Ensure Microsoft network client: Digitally sign communications (always)’ is set to ‘Enabled’?” Copilot’s answer: “In short: not enabling this control leaves your network traffic vulnerable to tampering and credential theft. The security risk far outweighs the minor performance gains of leaving it off.” Same control. Same tool. Wildly different risk assessments. The first prompt included organizational context—segmentation, defense in depth, existing controls. The second didn’t. The result: one answer says “not catastrophic in your setup,” the other says “the security risk far outweighs the minor performance gains.” If you’re an experienced analyst, you know which answer is more useful. You know that risk doesn’t exist in a vacuum—it exists in the context of your environment, your existing controls, your threat model. You know to include that context when you ask the question. If you’re six months into the job, you might not. You might ask the simpler question, get the scarier answer, and escalate a risk that doesn’t actually need escalation given your current posture. Or worse—you might accept the nuanced answer without understanding why those other controls matter, and miss a gap when one of them isn’t actually implemented correctly. And here’s the part that doesn’t get talked about enough: some people won’t push back on AI output at all. Not because they’re not smart—they are—but because there’s a built-in deference to the tool. “AI is probably smarter than me, right?” It’s the same authority bias we see with any expert system. If the output sounds confident and well-structured, it gets accepted. AI is excellent at pattern matching, synthesis, and language generation. It is not excellent at deep domain reasoning, organizational context, or understanding what “good enough” means in your specific environment. Those require human judgment—specifically, experienced human judgment. An analyst with six months on the job might not know when an AI-generated risk assessment has missed a critical business context that would change the severity rating. They might not recognize when a technically correct answer is operationally useless. If AI outcomes depend on individual judgment at every step, you haven’t built a capability—you’ve introduced another source of variability. The Hidden Risk: Judgment Variance The usual framing around AI adoption focuses on “skill gaps” or “prompt engineering.” That’s not wrong, but it misses something deeper. The real risk isn’t that junior analysts are bad at prompting. It’s that they don’t have the depth to challenge, constrain, or contextualize AI-generated answers. AI sounds confident even when it’s subtly wrong. Less experienced staff may accept answers at face value, miss nuance, miss downstream implications, or not know what question should come next. They might not realize when an answer is technically correct but operationally misleading. They might not catch when the AI has drifted out of scope or when it’s optimizing for coherence instead of correctness. I’ve been doing this long enough that I’ve built instincts for when something feels off. I know when to push back on an answer. I know what constraints to apply. I know what the next logical step in the analysis should be. That didn’t come from a training course—it came from years of seeing what breaks, what matters, and what auditors actually care about. Junior staff don’t have that yet—and that’s fine. That’s what learning looks like. But when we hand them a tool that externalizes all of that judgment to them, we’re not accelerating their work. We’re amplifying the risk that confident-sounding output gets accepted without the right scrutiny. In regulated environments, that’s not just a quality problem. It’s a governance problem. Context Pollution: The Failure Mode Nobody Talks About There’s another issue that only shows up with real-world use, and it’s one of the most under-acknowledged risks of chat-based AI: context pollution. Here’s what I mean. In a long-running chat session, earlier interactions don’t just disappear—they accumulate. Earlier assumptions linger. Partial conclusions get reused as facts. Edge cases from one question bleed into unrelated questions. After enough back-and-forth, the model starts optimizing for coherence with the conversation history rather than correctness for the current question. I’ve seen this firsthand. I use Cursor heavily for development work, and early on I was iterating continuously in the same session—asking questions, refining code, tweaking logic. After a while, the model started introducing problems. It refactored stable functions that didn’t need to change. It made assumptions that were relevant three questions ago but not anymore. It drifted out of scope in ways that were subtle but real. Once I started spawning fresh chat sessions for each discrete task, quality immediately improved. Context stayed aligned. Scope stayed tight. The outputs were more reliable because the model wasn’t dragging forward assumptions from earlier in the conversation. I didn’t figure this out on my own—I picked it up from one of the Cursor engineering team’s videos where they discussed this exact issue. I tested it. The quality went through the roof. A side project I’d been dabbling with on and off for over a year suddenly broke through. In about a month and a half of nights and weekends—maybe two to three days a week—I accomplished more than I had in the previous six months of fixing what got broken or backing out of rabbit holes I couldn’t even explain. That’s not a bug—it’s how conversational context windows work. As a session grows, the model doesn’t know which parts of the prior context are still valid. It just knows they exist. So it weights them into the next response, even when they shouldn’t apply anymore. I know this now. I watch for it. I reset when I need to. I ask clarifying questions to tighten scope when I feel things drifting. Most team members won’t. They’ll keep going, unaware that the analysis has quietly degraded. Hallucinations increase. Confidence stays high. Substance gets lost. And here’s the thing—even knowing this, I still fall into the trap. I’ll get deep into a back-and-forth session, correcting and refining, iterating toward something that’s almost right. The context window gets overloaded, polluted with half-finished threads and old assumptions. I can feel the quality starting to slip, but I’m so close. The temptation is to just push through and finish it off. So I keep going. And I just dig the hole deeper. My own workaround: when I start a new engagement, I build a prompt that helps me restart cleanly. That way, as I iterate, I can take the useful output, fold it into a fresh prompt in a new session, and start clean. It works. But in the heat of the moment—when you’re on a roll, when the deadline’s tight, when you’re sure you’re just one more iteration away from done—the discipline to stop and reset is hard to maintain. If I struggle with that, and I know what to watch for, how well do you think a junior analyst handles it when they don’t even know context pollution is a thing? This happens faster with rapid iteration and complex domains—exactly the conditions that SOC work, incident response, and GRC analysis create. And it’s almost invisible to someone who doesn’t already know what good output should look like. Vendors demo short sessions, clean prompts, happy paths. Real work looks like 30- to 90-minute deep dives with revisions, corrections, iteration, and scope creep. That’s where context pollution appears. And Copilot makes this worse, not better, because it encourages persistent interaction, blends content from multiple sources, and hides context boundaries from users. When AI output quality degrades gradually due to accumulated context, less experienced users often don’t notice—and that’s where judgment risk turns into institutional risk. Standard Prompts Don’t Solve This The obvious response is: “Fine, we’ll create standard prompts and train people to use them consistently.” That helps. But it doesn’t solve the underlying problem. Even with standardized prompts, long-running chat sessions still introduce variability. The prompt starts the conversation—it doesn’t control follow-up logic, enforce output structure, prevent scope drift, or encode the institutional knowledge that tells you what to do next. Here’s the thing: prompts alone don’t encode expertise. Context does. Context includes what inputs are allowed, what assumptions are valid, what constraints apply, what the next step in the workflow is, and what “good” versus “concerning” output actually looks like. Two people can use the exact same prompt and get very different outcomes depending on what they know to do with the response. One might recognize when the AI has given a technically accurate but operationally useless answer. The other might take it at face value and move on. That variability doesn’t live in the prompt. It lives in everything around the prompt—the judgment, the workflow, the institutional memory of what works and what doesn’t. Chat-based AI externalizes all of that context to every user, every time. For exploratory work, that’s fine. For operational work, it’s a structural problem. The Engineering Model That Actually Works So what does work? The answer isn’t complicated—it’s just different from what most vendors are selling right now. Model AI as a stateless service. Here’s what that looks like in practice. You take a repeatable task—SOC2 control analysis, risk classification, incident summary generation, whatever—and you build a lightweight service around it. Fresh context per invocation. Explicit, structured inputs. Predefined prompts with known constraints and controlled parameters. Deterministic sequencing. Structured outputs. Logged inputs and outputs for auditability. Each analysis starts from a clean, controlled context. Nothing persists unless you explicitly persist it. Context pollution is nearly eliminated by design because there’s nothing to pollute—no conversational state, no hidden assumptions, no lingering conclusions from previous runs. This isn’t “AI magic.” It’s applying the same engineering discipline we use everywhere else in production systems: stateless services, clear contracts, versioned logic, repeatable operations, monitoring and observability. You wouldn’t build a production application on mutable global state. You shouldn’t build AI workflows that way either. And that’s exactly what conversational AI is—mutable state. Every interaction changes the context. Every follow-up question adds assumptions. When AI starts pulling from prior conversations or blending context across sessions, you’ve got shared mutable state across an entire organization. That’s a recipe for unpredictable behavior. Stateless AI services solve this by design. Every invocation is isolated. Every analysis starts from the same known baseline. The only state that matters is what you explicitly pass in—and you control exactly what that is. And here’s where it gets powerful: within the service, you can encode the parameters that matter for your organization. Say you’re building a risk assessment service. If you clearly define your organization’s risk tolerance, existing control posture, and environmental context up front, you level the playing field. An experienced analyst who knows to provide that context and a junior analyst who doesn’t both get the same baseline. You also control for analysts whose personal risk tolerance might be stricter—or looser—than what the organization is actually willing to accept. Remember the CIS control example earlier? The difference between “not catastrophic in your setup” and “security risk far outweighs the minor performance gains” came down entirely to whether organizational context was included in the prompt. In a stateless service, that context isn’t optional—it’s baked into the service definition. Every invocation gets it. Every output reflects it. When you model AI this way, you get real operational benefits. Outputs are consistent regardless of who runs the task. Junior staff get the benefit of senior judgment without needing to become prompt engineers. Quality is reviewable and auditable. Changes are versioned and intentional. Risk posture is explainable to auditors. You’re not asking your team to “use AI.” You’re giving them better tools that happen to be AI-powered. That’s a critical distinction. I’ve built this model as a microservice tool for SOC2 analysis—not as a full product, just a proof of concept to see if the approach held up under real use. So far, in testing it has. Here’s exactly what this costs and saves, because nobody talks about real numbers – Under $10 in API calls for about 15 SOC2 control analyses. In the near future, it’s available to my team for testing. If we adopt it after vetting and getting buy-in, it’ll become the default tool. And the time savings are real. I think this tool will save four to five hours per SOC2 report. It can digest 10- to 20-page SOC reports—sometimes longer—and surface the nuggets of information that are actually necessary to make an informed analysis. That means my analysts don’t have to die slowly inside reading 20,000+ words of god-awfully boring compliance language to find the things that matter. A knowledge worker still has to do the work. The tool doesn’t make decisions. But it streamlines the hell out of the process. Analysts fill in structured fields, click run, review output, and apply their judgment to the result. The consistency and quality are significantly higher than ad-hoc chat usage, and nobody has to become a prompt engineer to use it. That’s what institutional capability looks like. You capture the mental model once and reuse it. You don’t ask every analyst to reinvent it from scratch every time they open a chat window. Why This Matters for Regulated Work Let’s connect this back to what actually happens in operational security and compliance work. When AI is embedded in structured workflows: Outputs are consistent regardless of who runs the task Quality doesn’t depend on someone’s ability to ask good follow-up questions Analysis is auditable—you can trace inputs, prompts, and reasoning Changes are versioned and deliberate, not accidental drift Risk decisions are defensible when someone asks “how did you reach this conclusion?” Institutional knowledge gets preserved and reused instead of locked in individual practitioners When AI lives in chat: Outputs depend on individual skill, experience, and judgment Quality varies based on who’s using it and how long the session has been running Auditability is fuzzy at best—good luck reconstructing the reasoning six months later Variance compounds over time as different people develop different habits Risk decisions are hard to defend because the process isn’t repeatable Senior expertise stays trapped in senior people—it doesn’t scale For SOC analysis, GRC assessments, incident response, audit prep—work where variance is risk—that difference isn’t academic. It’s the difference between a capability you can rely on and a tool that introduces as many problems as it solves. If AI depends on how well someone prompts, it’s not a scalable solution for a regulated team. Where Chat Still Fits To be clear: chat-based AI has legitimate, valuable use cases. It’s excellent for brainstorming, learning new domains, drafting content, exploring ideas, ad-hoc research, and answering one-off questions. For those contexts, conversational flexibility is the feature, not the bug. You want open-ended exploration. You want the ability to iterate and refine. You want the model to follow your train of thought even when it’s messy. The issue isn’t that chat is wrong. It’s that chat and capability are different things, and most organizations are conflating them. Copilot is a productivity tool. It’s designed to make individuals more effective at unstructured knowledge work. That’s useful. But it’s not the same as building AI into operational workflows where consistency, governance, and auditability matter. Both can coexist. Chat-based tools for exploratory work and individual productivity. Modeled services for structured operational work. The mistake is assuming one can replace the other, or that buying productivity tools will automatically deliver capability outcomes. The Real Strategic Question Here’s what it comes down to. Organizations face a choice, whether they realize it or not: treat AI as personal productivity software and accept the inherent variability, or engineer it like any other enterprise capability with appropriate controls and governance. Both approaches are valid. Mixing them—buying productivity tools and expecting capability outcomes—is where things break. If you’re deploying chat-based AI for individual use, you need to accept that outputs will vary by user, that quality will depend on skill and judgment, and that governance will be loose. That’s fine for a lot of work. It’s not fine for everything. If you need consistency, auditability, and institutional capability, you need to structure AI differently. You need to treat it like any other production system: define inputs, control context, version logic, monitor outputs, build review gates. The question to ask isn’t “should we use AI?” It’s “where does AI need to be modeled as a service, and where can it live as a chat tool?” Most organizations haven’t asked that question yet. They’re still in the “we bought licenses, therefore we’re doing AI” phase. That’s not strategy—that’s procurement with a narrative. What This Looks Like in Practice If you’re trying to figure out where your organization actually is on this, here are the questions worth asking: What work requires consistency and auditability?That’s where AI needs to be embedded in structured workflows, not left to individual chat sessions. SOC analysis, risk classification, compliance assessments, incident documentation—those aren’t exploratory tasks. They’re repeatable processes that need institutional quality control. What work benefits from exploration and flexibility?That’s where chat-based tools fit. Research, brainstorming, learning new domains, drafting content—work where the goal is discovery, not consistency. Are you building the structure around AI, or just buying the interface?Prompts alone won’t get you there. The real work is defining inputs, constraints, workflows, and review gates. If you’re not investing in that structure, you’re not building capability—you’re distributing tools and hoping individual users figure it out on their own. Can you explain how AI-generated decisions were reached if an auditor asks?If the answer is “it depends on who ran the analysis and what they asked,” you have a governance problem. If the answer is “here are the versioned prompts, logged inputs, and structured outputs,” you’re in much better shape. What happens when your most experienced users leave?If their expertise lives in their heads and their chat sessions, it leaves with them. If it’s encoded in modeled services, it stays with the organization. Those questions will tell you whether you’re building capability or just distributing tools and hoping for the best. A Final Note I’m not skeptical of AI. I use it daily—multiple times a day. I’ve built AI-integrated components. I’ve seen it work. I’ve also seen where it breaks down, and it’s almost never the model’s fault. It’s the structure around it. Or the lack of structure. The teams that figure out where AI needs guardrails and where it needs freedom will build real capability. The ones that treat it as productivity dust they can sprinkle on everything will spend the next two years wondering why outcomes didn’t match the demos. We’re past the point of debating whether AI is useful. The question now is whether we’re deploying it in ways that match how the work actually gets done—and in regulated environments, how the work needs to get done. Variance isn’t just inefficiency. It’s risk. Chat-based AI is a tool. Modeled AI services are a capability. Both have a place. Just don’t confuse one for the other, and don’t expect your $30/month Copilot licenses to solve problems they were never designed to address. The organizations that get this right early will have a real advantage. The ones that don’t will figure it out eventually—usually after enough variance-driven incidents that someone finally asks the question I had to ask my SVP: “What are we actually using AI for?” If you can’t answer that clearly, you’re not behind on AI adoption. You’re just not there yet. The post Why Chat-Based AI Tools Fail in Operational Security: Building Capability vs. Productivity appeared first on Cultivating Security.
-
9
Week 4: The Logging and Visibility Problem No One Mentions
You probably think you can see more than you actually can. That’s not a criticism—it’s just how modern environments work. The assumptions we built our mental models on (servers you own, networks you control, applications you can instrument however you want) don’t hold anymore. But we still operate like they do. SaaS applications don’t give you the same visibility you’d have if you ran the application yourself. Cloud providers give you logs, but not necessarily the logs you need. Third-party integrations happen at the API layer where your network monitoring can’t see them. Serverless architectures create ephemeral compute that exists for seconds and then disappears. And somehow we’re supposed to detect threats, investigate incidents, and demonstrate compliance in environments where half of what’s happening is invisible to us. The gap between what vendors promise and what their APIs actually deliver is real. The difference between “we provide comprehensive logging” and “here’s what you can actually export and how much it costs” is often significant. And most organizations don’t discover this gap until they’re in the middle of an incident and realize they can’t answer basic questions about what happened. The Old Model (It’s Gone) Ten to fifteen year ago (roughly 2010-2015 for those reading this in the future), visibility was hard but at least it was straightforward. You owned the servers. They sat in your datacenter. You controlled the network. You could put whatever monitoring and logging you wanted on them. Want to know what happened on a system? You had the logs. Want to capture network traffic? You owned the infrastructure. Want to instrument an application? You controlled the deployment. The constraints were mostly technical and resource-based. Storage was expensive, so you couldn’t keep logs forever. Processing power was limited, so you couldn’t analyze everything in real-time. But in theory, if you had the resources, you could see everything that mattered. That model is mostly dead now. The New Reality (It’s Complicated) Modern environments are a mix of SaaS applications you don’t control, cloud infrastructure you sort-of control, on-premises systems you fully control (but are increasingly a minority), and mobile/remote users connecting from everywhere. Each piece has different visibility characteristics, different logging capabilities, different costs, different APIs, different limitations. SaaS applications are particularly tricky. You get whatever logging the vendor decides to provide. Sometimes that’s comprehensive. Sometimes it’s basic audit logs that tell you who logged in but not much about what they did. Sometimes it costs extra. Sometimes it’s only available in higher-tier plans. Sometimes the data is there but the API to extract it is rate-limited or poorly documented. You don’t control the infrastructure, so you can’t just install an agent or capture packets. You’re dependent on what the vendor exposes, and their priorities aren’t always your priorities. Cloud providers give you a lot of visibility—if you know where to look and how to configure it. But it’s not automatic. CloudTrail in AWS doesn’t log data events by default. Azure Activity Logs don’t capture everything. GCP audit logs need to be configured per-service. And all of this generates massive volumes of data that cost money to store and process—much of it operational noise with limited security value. You’re often paying to retain logs that don’t help you detect or investigate incidents, while the events you actually need might require additional configuration or higher-tier services to capture. The visibility is there, but you have to deliberately build it. And you have to pay for it, which means someone has to approve that cost. The Vendor Promise vs. Reality Gap Here’s a conversation that happens constantly: Security team: “We need comprehensive logging for [SaaS application].” Vendor sales: “Absolutely, we take security very seriously. We provide full audit logging of all activities.” [Six months later, during implementation] Security team: “We’re ready to integrate your logs into our SIEM.” Vendor support: “We don’t have a direct SIEM integration. You can manually export logs from the admin console. But don’t worry—if there’s ever an incident, we’ll help you get whatever logs you need.” [Fourteen months later, during an incident] Security team: “We need the access logs for this compromised account for the past 90 days.” Vendor support: “Unfortunately we can’t provide those. Our logging infrastructure commingles customer data and we don’t have a way to filter and export just your logs. We can give you authentication events and admin actions, but detailed access logs aren’t available.” This isn’t malicious. The vendor’s not lying, exactly—they do have logging. It’s just that what they consider “full audit logging,” what’s actually accessible to you, and what you need for security investigation are three different things. And you don’t find out about the gap until you need the logs. What You Can’t See (It’s More Than You Think) In a typical modern environment, visibility gaps fall into several categories: SaaS and vendor-controlled systems Authentication visibility depends on whether you control it. If the application federates through your SSO, you can see logins in your identity provider logs. If it doesn’t, you’re dependent on what the vendor provides—which might be limited to their admin console with no way to export or integrate with your monitoring. Beyond authentication, detailed user activity—what documents they accessed, what data they downloaded, what API calls they made—is often not available, or only available at premium tiers, or only retained for short periods. And even when these logs exist, they’re often in proprietary formats with no API for automated export, or the API is rate-limited to the point of being useless for real-time monitoring. Manual exports from a web console aren’t a viable solution at scale. API traffic between applications is similarly opaque. Service-to-service authentication, automated integrations, data exchanges—in an on-premises environment you could at least capture this with network taps, port mirroring, or API gateways and proxies that intercept and log traffic. In SaaS environments, you don’t control the underlying infrastructure, so that’s not an option. You’re entirely dependent on application-level logging that the vendor may or may not provide. The major providers like Microsoft 365 might expose some API activity logs. Most mid-tier and startup SaaS vendors don’t expose these logs to customers at all. Cloud infrastructure Someone spins up an EC2 instance, uses it for a few hours, and terminates it. If you’re not capturing those events in real-time and you’re not paying to retain them long-term, they might as well have never happened. Ephemeral resources come and go, and if your logging isn’t configured to catch them, you have no record they existed. Shadow IT Departments using applications that IT doesn’t know about. Data being stored in places that aren’t sanctioned. Integrations being set up by business users who don’t think about security implications. By definition, you can’t monitor what you don’t know exists. This is often SaaS applications purchased with credit cards, bypassing procurement and IT approval entirely. Unmanaged and BYOD endpoints Users accessing resources from personal devices, from home networks, from coffee shops. You might see the authentication, but the endpoint visibility you’d have on a corporate-managed device isn’t there. MDM solutions like Intune can give you some visibility—device compliance status, patch levels, whether antivirus is running. But can you get forensically useful logs? Can you see process execution, network connections, file access? Not in the same way you could with a corporate-managed endpoint running full EDR. You know the device met your baseline requirements when it connected, but you don’t have the detailed telemetry you’d need to investigate suspicious activity. Encrypted traffic SSL/TLS everywhere is good for security. It’s terrible for visibility if you’re trying to inspect traffic. TLS interception is becoming more common and easier to implement, but it’s not comprehensive. Some traffic has to be excluded—applications using certificate pinning, mutual TLS authentication, or endpoints that break when you try to intercept. Medical devices, some IoT, certain vendor integrations—these often can’t tolerate interception without breaking functionality. And even when interception is technically possible, the operational overhead of managing it (certificate distribution, exclusion lists, troubleshooting broken applications) means you’re making trade-offs about what you actually inspect. You get visibility into some encrypted traffic, but you’re still trusting a lot of it. The Cost Problem Logging isn’t free, and comprehensive logging is expensive. Cloud providers charge for log storage. SIEM vendors charge per GB ingested. Analysis tools charge for compute. Every log source you add, every event you capture, every day of retention you want—it all costs money. And in cloud environments, the costs compound. If you’re in AWS sending logs to a cloud-based SIEM: you’re charged to generate the logs, charged for egress to transmit them, charged per GB to ingest them into your SIEM, and charged to store them—often in multiple places. You might not realize a log source isn’t security-relevant until months in, when you finally get around to normalization and building use cases, only to discover you’ve been paying to collect noise. So you make trade-offs. You log authentication events but not every API call. You capture critical system changes but not routine operations. You retain logs for 90 days instead of a year because that’s what the budget allows. These are reasonable decisions based on real constraints. But they create gaps, and you need to understand what those gaps are. The other cost is operational. More logs mean more noise. More alerts. More false positives. More time spent tuning and managing the logging infrastructure instead of actually using the data. There’s a balance between “log everything” (expensive, noisy, often impractical) and “log nothing” (cheap, quiet, useless). Finding that balance requires understanding what you actually need versus what would be nice to have. What You Actually Need (It’s Less Than Everything) You can’t log everything. But you can make intelligent choices about what matters most. Take Windows event logs as an example. Out-of-box default configuration gives you some useful security events, but other valuable events aren’t enabled by default. And if you just ingest all of Application, System, and Security logs without filtering, you’ll be buried in operational noise with no security value. There are guides for what to collect, but actually implementing that filtering—especially depending on your SIEM’s capabilities—takes time, expertise, and persistence. You have to keep asking: do I need this? Is this useful? It’s not a one-time configuration; it’s ongoing refinement. Core log categories worth prioritizing: Authentication and authorization events. Who logged in, when, from where. Successful and failed attempts. Privilege changes. Access grants and revocations. This is foundational—you need to know who did what, and that starts with knowing who was authenticated. Administrative actions. Changes to configurations, policies, permissions. Creation and deletion of resources. Anything that modifies the security posture or operational state. These are high-value events that should basically always be logged. Access to sensitive data. If you have data that’s particularly valuable or regulated, you need to know who accessed it. This is harder in SaaS environments, but it’s worth fighting for. Security-relevant events. Firewall blocks. IDS/IPS alerts. Antivirus detections. Authentication failures beyond normal thresholds. Things that might indicate compromise or attack. Change tracking for critical systems. What changed, when, and who changed it. For production systems, for security infrastructure, for anything where unauthorized changes could cause serious problems. You don’t necessarily need to log every single read operation in a database. You don’t need verbose debugging output from every application. You don’t need to capture every DNS query (unless you’re doing DNS-based detection, which is legitimate but specialized). Figure out what your crown jewels are and what your riskiest systems are—make sure you have good visibility for those. Everything else is nice-to-have, and you prioritize based on resources. The Detection Problem Logs only matter if you actually use them. And using them means having some way to detect anomalies, threats, or policy violations. If you’re collecting logs but nobody’s looking at them except during incidents, you’re doing expensive archival, not security monitoring. But here’s the thing: archival has value too. Commercial aircraft carry flight data recorders—black boxes that continuously record flight parameters, cockpit audio, and system status. Nobody monitors this data in real-time. The recorder just captures everything. But when something goes wrong, investigators use that data to reconstruct exactly what happened, second by second. Without it, you’re left guessing. Your SIEM serves the same purpose. Even if you’re not actively monitoring every event in real-time, having that forensic capability when you need to investigate an incident is critical. Understanding what happened, how, and when depends on having those logs available. So don’t dismiss log collection just because you’re not building sophisticated detection on top of it yet. Detection requires either rules (if X happens, alert) or baselines (if this deviates from normal, alert) or threat intelligence (if we see indicators of known-bad, alert). All of these require effort to build and maintain. Rules need to be tuned. What generates alerts in your environment? What’s normal activity that looks suspicious? What’s actually suspicious but happens rarely enough that you don’t have good patterns for it? This tuning process never ends. Some MDR providers promise to handle this for you, but be careful about the black box approach. If they’re running generic rulesets across all their customers without understanding your specific environment, you’ll get alerts that don’t make sense for your context and miss things that matter in your particular setup. Worse, if your environment is slightly non-standard—a configuration setting that’s just a bit different, a logging format that’s slightly off—their rules might never trigger even when there’s a real problem. The alert they expect to see doesn’t fire because your logs don’t match their assumptions. Effective detection requires knowing your environment, and outsourcing doesn’t eliminate that requirement—it just shifts who needs to know it. Baselines require understanding what normal looks like, which means having enough data to establish patterns, which means you need to have been logging long enough to know what normal is. And normal changes over time, so baselines drift. This gets complicated with fragmented identity visibility. If you’re not ingesting and correlating all authentication paths—on-prem AD, Azure AD/Entra, M365, federated SaaS applications—you can’t establish an accurate baseline for how an identity actually behaves. You might see AD authentications and M365 authentications as separate, unrelated streams. Analyzed independently, neither looks unusual. But if you could see them together, the pattern would be obvious: this user never logs into M365 from that location, or this authentication sequence doesn’t match their normal workflow. Hybrid and cloud identities make baseline detection harder because the full picture is spread across multiple log sources that need to be normalized and correlated. Threat intelligence helps with known threats but doesn’t help with novel attacks or insider activity or misconfigurations that create risk without being actively malicious. And not all threat intelligence is created equal. There are excellent open-source and commercial threat intel sources. There are also terrible ones. I’ve seen threat feeds that included legitimate IPs because someone submitted indicators from a phishing email without filtering out the spoofed sender addresses and legitimate URLs the attacker included to make the email look real. I’ve seen feeds where one bad submission poisoned the entire aggregator. If you just turn on every available threat feed without validation, you’ll either block legitimate traffic or you will see so many false positive alerts, you will stop using your SIEM because it’s too noisy. Threat intel requires curation. You need to understand the source, validate the indicators make sense, and monitor for false positives. And think carefully about how you use it. The most effective approach isn’t using threat feeds to generate alerts or actively block traffic—that’s where bad intel causes the most damage. Instead, use threat intelligence to influence risk scoring. An authentication from an IP that appears in multiple trusted threat feeds gets weighted higher in your risk calculation. A file hash match adds context to other suspicious behaviors. The key is weighting based on source quality and combining threat intel with other correlations and detections you’ve built. Threat feeds are contextual enrichment, not triggers for action. The detection problem is honestly harder than the logging problem in many ways. Logs are just data. Detection is turning that data into actionable information, and that’s where a lot of organizations struggle. The Retention Question How long do you keep logs? It depends on what you’re trying to accomplish. For incident investigation, you need logs that go back far enough to reconstruct what happened. Breaches often aren’t discovered immediately. The median dwell time (time between initial compromise and detection) is measured in weeks or months. If you’re only retaining logs for 30 days, you might miss the early indicators entirely. But there’s a practical limit to how far back logs remain useful for investigation. Within 30 days, you can definitively say what happened based on logs. Within 90 days, same—maybe with a bit more uncertainty as context fades. Six months or more? Things get fuzzy. Systems have been patched or reconfigured. People who made decisions have left. The environment has changed enough that you’re making educated guesses rather than definitive statements. Keeping logs forever isn’t always useful—there’s a point where the forensic value diminishes because the context that makes them interpretable is gone. For compliance, you need whatever the regulation or framework requires. PCI-DSS wants a year. Some frameworks want more. This is non-negotiable if you’re in scope. For threat hunting, you need historical data to identify patterns over time. “Show me all authentication attempts from this IP over the last six months” isn’t a question you can answer if you only have 90 days of logs. But retention is expensive. A year of detailed logs for a medium-sized environment can run into serious money. So you make decisions: keep authentication logs for a year, keep detailed application logs for 90 days, keep packet captures for a week. Hot storage (fast, queryable, expensive) versus cold storage (cheap, slower to retrieve but still readily available, good enough for compliance) versus archival/glacial storage (AWS Glacier, Azure Archive tier—extremely cheap, but retrieval takes hours and costs money). Different retention periods for different log types based on their investigative value. If you’re using archival storage, test the restoration process before you need it. How long does it take to retrieve? What does it cost? Can you query it in place or do you have to restore to hot storage first? Does it require a different interface or query language? Learning a completely new system during an active forensic investigation is miserable. Know how it works ahead of time. There’s no one right answer, but there are wrong answers. Thirty days is probably too short for most purposes. Seven years is probably excessive unless you have specific regulatory requirements. Blind Spots You Didn’t Know You Had The dangerous thing about visibility gaps is that you often don’t know they exist until you need the data and discover it’s not there. You think you’re logging all administrative changes, but it turns out changes made through a particular API endpoint aren’t captured. You think you’re monitoring file access, but only on the primary file server, not the secondary one that got set up six months ago. You think you’re capturing authentication events, but only for interactive logins, not for service account activity. The time to discover these gaps is not during an incident. Periodically test your visibility. Run tabletop exercises where you simulate an incident and walk through what logs you’d need. Can you answer basic investigative questions? If not, you’ve found a gap. Review your logging configurations regularly. Environments change. New resources get added. Vendors update their logging capabilities (sometimes removing features, sometimes adding them). What was true six months ago might not be true now. Talk to vendors about their logging roadmap. If there’s a capability gap that matters to you, raise it. Sometimes vendors don’t prioritize logging features because customers don’t ask for them. Be the customer who asks. Building Visibility Deliberately You can’t fix everything at once, but you can make deliberate progress. Start with authentication. If you can’t see who’s logging in and from where, you’re operating blind. This should be foundational. Add administrative activity logging. Changes to security configurations, user permissions, infrastructure. This is high-value data that’s usually feasible to capture. Layer in access to sensitive data when possible. This is harder in SaaS environments, but for systems you control, implement it. For SaaS, push vendors to provide it or find compensating controls. Build detection gradually. Don’t try to alert on everything at once. Pick a few high-value detection use cases and implement those well. Then add more over time. Document what you can see and what you can’t. Be explicit about blind spots. This helps with risk discussions and helps prioritize what to fix next. And accept that visibility will never be complete. But better visibility than you had last quarter is still progress. Practical Takeaways Understand what logging your SaaS vendors actually provide before you commit to them. “Comprehensive audit logging” means different things to different vendors. Cloud (IaaS/PaaS) environments require deliberate configuration. Logging isn’t automatic. Know what you need to enable and what it costs. Prioritize authentication, authorization, and administrative actions. These are foundational and usually feasible to capture. Test your visibility periodically. Simulate incidents and see if you can answer investigative questions with the logs you have. Document blind spots explicitly. Knowing what you can’t see is itself valuable information. Balance retention costs against investigative and compliance needs. Not everything needs to be kept forever, but 30 days is usually too short. Build detection incrementally. Start with high-value use cases and expand over time rather than trying to alert on everything at once. Visibility is expensive and imperfect, but deliberate investment in the right areas makes a real difference when you need it. The post Week 4: The Logging and Visibility Problem No One Mentions appeared first on Cultivating Security.
-
8
Week 3: Fort Knox Isn’t the Goal: Learning to Live with Imperfect Security
Here’s something nobody tells you when you’re starting out: your job is not to eliminate risk. I know that sounds wrong. You got into security because you care about protecting things. You see the threats, you understand the vulnerabilities, you know what could go wrong. Your instinct is to fix everything, lock everything down, make the organization as secure as possible. That instinct will destroy you if you don’t learn to manage it. Because perfect security doesn’t exist. Even if you had unlimited budget (you don’t) and complete organizational support (you definitely don’t), you’d still have residual risk. Users will still click things they shouldn’t. Vendors will still get breached. Zero-days will still happen. Determined attackers will still find ways in. Your job isn’t to achieve perfect security. Your job is to understand the risks, help the organization make informed decisions about which ones to address and in what order, and implement controls that are proportional to both the threat and the business impact. That last part—proportional—is where early-career people struggle. The Expectations Gap You probably came into your first security role with a mental model of how things should work. Strong authentication everywhere. Comprehensive logging. Regular patching. Network segmentation. Principle of least privilege. All the things the frameworks and certifications said you should do. Then you saw the actual environment. Legacy applications running on operating systems that haven’t been supported in years. Service accounts with passwords that haven’t changed since they were created. Shadow IT everywhere. MFA deployed inconsistently at best. Logging gaps you could drive a truck through. Privileged access that makes no sense. Technical debt stacked so high you can’t see the top. And your first thought was probably: how is everyone just okay with this? Here’s the thing they’re not okay with it, exactly. They just understand something you haven’t learned yet: security exists in context. Business context, operational context, resource context. And in that context, “this is terrible but it’s what we can do right now” is sometimes the honest answer. Understanding Risk Tolerance (Yours Is Probably Wrong) When you’re early in your career, your risk tolerance tends toward zero. Every vulnerability feels critical. Every missing control feels like negligence. Every gap between current state and ideal state feels urgent. Your management’s risk tolerance looks different because they’re weighing things you might not see. Budget is finite. That security tool you want costs $200K annually. That’s three headcount they can’t hire. Or a business initiative that doesn’t get funded. Or infrastructure improvements that get delayed. Money doesn’t appear because “security is important.” There are trade-offs. Operational impact matters. The control you want to implement might be technically sound, but if it breaks a business process or creates so much friction that users just work around it, you’ve made things worse. Security that people bypass isn’t security. Opportunity cost is real. Time spent on one risk is time not spent on another. Maybe that medium-severity vulnerability in an internal tool matters less than getting MFA deployed. Maybe fixing the legacy system matters less than securing the new cloud environment properly from the start. Everything is relative. Regulatory requirements have teeth. Sometimes the organization has to meet specific compliance obligations that don’t align perfectly with what you think is most important from a pure security perspective. The audit finding has a deadline. The customer contract has requirements. Those carry business consequences. Likelihood and impact aren’t the same for every system. A critical vulnerability in an internet-facing application processing customer data is different from the same vulnerability in an isolated test environment. Context changes everything, and developing good intuition for that context takes time. The Maturity Curve You’re Not Seeing Here’s what often happens: you join an organization and immediately see all the gaps. You don’t see the progress because you weren’t there to see where they started. Maybe two years ago they had no centralized logging at all. Now they have it for critical systems. That’s not complete coverage, but it’s real improvement. Maybe MFA was only on VPN last year. Now it’s on email and major SaaS applications. Still not everywhere, but demonstrably better. Maybe the CMDB was complete fiction eighteen months ago. Now it’s only partially fiction, and there’s actually a process for keeping it updated. You’re measuring against an ideal state. Management is measuring against historical state and trajectory. Both perspectives have value, but understanding theirs helps calibrate your expectations. Progress in security is measured in years, not quarters. Culture change is slow. Budget cycles are annual. Organizational inertia is a real force. The fact that everything isn’t fixed yet doesn’t mean nothing is happening. Learning to Prioritize (This Is the Actual Skill) The ability to triage effectively and communicate risk proportionally—that’s what separates junior security people from senior ones. Not everything is critical. Actually critical means “if this gets exploited, the business suffers immediate, material harm.” Customer-facing production systems. Financial systems. Authentication infrastructure. Things where failure has direct, measurable consequences right now. Plenty of things are important. Fewer things are genuinely critical. Learn to tell the difference and communicate it honestly. If you call everything critical, nothing is. Exploitability and exposure matter as much as severity. A vulnerability in an internet-facing application is fundamentally different from the same vulnerability in something only accessible from a locked-down internal network. A misconfiguration in production is different from the same issue in a development sandbox. Threat modeling isn’t theoretical—it’s about understanding realistic attack paths in your specific environment. Compensating controls are real. Defense in depth means sometimes weakness in one area is offset by strength in another. If you can’t patch a legacy system immediately (maybe because it requires an outage during your busiest season, maybe because vendor support is complicated, maybe because testing takes weeks), you can segment it, add monitoring, restrict access, implement application-layer controls. Not ideal, but better than nothing—and sometimes “not ideal but implementable” beats “perfect but impossible.” Business context shapes everything. If fixing something requires downtime during year-end close, that’s a different conversation than if you can schedule it during a maintenance window. If implementing a control breaks a revenue-generating process, you’d better have a solid answer for what the alternative is or how to implement it without that impact. The Acceptance Part (This Is Hard) At some point you’re going to document a risk clearly, explain it thoroughly, propose a reasonable solution, and watch management decide not to act on it. This will feel bad. It should feel bad—you care about doing the job right, and someone just decided to accept a risk you’re not comfortable with. But here’s what you need to understand: if they’re the accountable party, it’s their call to make. Your job was to make sure they understood the risk, the potential impact, and the options for addressing it. Their job is to decide whether to accept it given all the other constraints and priorities they’re managing. Document it. Make sure the decision and the rationale are captured. Then move on to the things you can actually influence. Sometimes they’re making the wrong call. If that becomes evident later—whether through an incident or just operational reality—having clearly documented your position matters. Not for “I told you so,” but because it establishes that your risk assessment process is sound. That builds credibility for future discussions. Sometimes they’re making the right call based on context you didn’t have. Strategic plans that aren’t public yet. Customer commitments that shape priorities. Budget realities that aren’t your problem to solve. They might know things you don’t. And sometimes it’s genuinely ambiguous, and reasonable people can disagree about where the acceptable risk line should be drawn. That’s okay too. Perfect clarity is rare. What Burnout Looks Like (And How to Address It) If you treat every security gap as a personal failure, you will burn out. Fast. If you think your job is to achieve perfect security, you will fail. Because perfect security is impossible, and measuring yourself against an impossible standard is a recipe for misery. If you can’t let go of risks that management has accepted, you’ll spend all your energy fighting battles you can’t win instead of focusing on the ones you can. I’ve watched talented people leave the field entirely because they couldn’t reconcile the gap between how things should be and how things actually are. That’s a waste. The field needs people who care deeply about doing this work well. But caring deeply has to coexist with accepting that progress is incremental and perfection isn’t achievable. The mandate without authority trap Another burnout path: being given mandates without authority. “Get us PCI compliant” from leadership, but business units won’t change their processes. You can’t get certified without those changes, but you can’t force the changes. You’re accountable for an outcome you can’t control. Here’s what’s insidious about this pattern: the people giving you the mandate often don’t realize the position they’ve put you in. They genuinely don’t see the disconnect between “become PCI compliant” and the reality that you can’t force business process changes. They think they’ve given you a clear goal and the resources to achieve it. If you’re in this situation, you need to surface it explicitly. Not in a venting way, but clearly: “You’ve asked me to achieve X. To do that requires Y changes from business units. I don’t have authority to mandate those changes. What do you want me to do?” Sometimes that conversation reveals they didn’t understand the gap. Sometimes it gets you the support you need. Sometimes it clarifies that the mandate wasn’t as firm as you thought. But if you’re physically stressed, losing sleep, experiencing symptoms from the impossible position—that’s your signal that something has to change. Either the organization fixes the authority gap, or you need to be somewhere else. Don’t let it spiral to the point where it’s damaging your health. The Realistic Goal What you’re actually aiming for is incremental improvement over time. Building foundational capabilities. Demonstrating value so that when you ask for resources, you have credibility. Picking the battles that matter most and winning enough of them to move the needle. This requires honest dialog with management about what’s feasible. Not pessimistic “we can’t do anything,” but realistic “here’s what we can accomplish this year with current resources, here’s what would require additional investment, here’s what we need to accept as residual risk.” Setting realistic goals together prevents the mandate-without-authority trap and builds shared understanding of progress. Mature security isn’t Fort Knox. It’s understanding your risks, implementing controls that are proportional and sustainable, having visibility when things go wrong, and being able to respond effectively when incidents happen. It’s patching critical systems promptly even if you can’t patch everything immediately. It’s having MFA on your most sensitive resources even if it’s not everywhere yet. It’s knowing what your crown jewels are and making sure those are protected first. It’s logging what you actually need to detect and investigate incidents—not necessarily everything everywhere, which often creates more noise than signal. It’s having network segmentation around critical systems even if the entire environment isn’t perfectly segmented. It’s implementing least privilege for high-value access even if legacy applications still have overprivileged service accounts. It’s accepting that you’re managing risk in a complex environment with finite resources and competing priorities, and doing the best you can within those constraints. Calibrating Your Risk Tolerance Here’s what helps: Understand the organization’s context. A startup optimizing for speed to market has different risk tolerance than a bank. A non-regulated industry has different constraints than healthcare. Where you are shapes what’s acceptable. Prioritize based on business impact, not just technical severity. The CVSS score is a data point. It’s not the decision. A critical vulnerability in an isolated lab environment is different from a medium vulnerability in your customer-facing authentication system. Context matters more than the number. Celebrate progress, not just perfection. If you went from 60% patch compliance to 85%, that’s real improvement. Acknowledge it. Build on it. Don’t just fixate on the remaining 15%. Know when to escalate and when to document and move on. Some risks are worth fighting for. Some you raise clearly, document thoroughly, and then let go when the decision goes against you. Wisdom is knowing the difference. Learn to communicate risk, not fear. “We could get breached” is useless. “Here’s the specific scenario, here’s the likelihood based on our environment, here’s the business impact if it happens, and here’s what it would cost to address it”—that’s useful. It gives decision-makers what they need. Public breaches are valuable here. “This happened to [Company X] because of [weakness]. We have the same pattern in our environment. Here’s how it would play out for us.” Real examples are more credible than theoretical scenarios, and they help leadership understand this isn’t hypothetical fear-mongering—it’s demonstrated risk based on what’s actually happening in the industry. Taking Care of Yourself This job is a marathon, not a sprint. If you’re constantly anxious about risks you can’t control, you won’t last. You have to find a way to care deeply about the work without carrying the weight of every unresolved issue home with you. To do your job well without letting the inevitable imperfection become a source of constant stress. Part of that is maintaining open dialog with your management about reality. What do they actually expect? What’s feasible given current resources and authority? What are the persistent roadblocks preventing progress? If you’re stuck in an endless cycle—closing one audit just to open another, fixing a decade of accumulated debt while being measured against perfect compliance, dealing with constant emergencies that prevent systematic improvement—that needs to surface explicitly. Not as complaint, but as factual assessment: “Here’s what we’re being asked to achieve, here’s what’s preventing us from getting there, something has to change.” Keeping that internal helps no one. Letting it build until you’re ready to quit doesn’t serve you or the organization. Your risk register can help with the psychological burden too. Include an ownership column—what business unit owns this process, system, or application? When you document a risk, assign it to the appropriate business owner. Not as blame, but as accurate accountability. You identified the risk. They own the decision about whether to address it. That distinction matters. A risk owned by the finance team because of their legacy accounting system isn’t your risk to carry home. You’ve done your job by surfacing it. The pressure to fix it belongs with the people who own the system and make decisions about it. This is especially important for risks that existed before you arrived—clearly documenting ownership helps you stop carrying responsibility for decisions made years ago by people who aren’t even there anymore. Beyond these structural approaches—dialog with management, clear ownership in your risk register—you also need internal strategies for managing the day-to-day stress. Some people do this by being very clear about what’s in their control and what isn’t. Some do it by focusing on incremental wins and measuring progress over time. Some do it by maintaining perspective—remembering that even imperfect security is better than no security, and that the work you’re doing matters even if it’s not perfect. However you manage it, you need to manage it—both structurally and personally. Because burning out doesn’t help anyone, least of all the organizations that need competent security people who can operate effectively over the long term. The Long View Security maturity is measured in years. Culture change takes time. Budget cycles are annual. Organizational inertia is real. Your job in the early part of your career is to learn how to operate effectively in imperfect environments. To build skills in prioritization, communication, and risk management. To understand that doing security well means making trade-offs, not achieving perfection. The people who last in this field are the ones who figure out how to care deeply about the work without letting the inevitable imperfection destroy them. Fort Knox isn’t the goal. A security posture that’s proportional, sustainable, and measurably better than it was last year—that’s the goal. And that’s hard enough without making it harder on yourself. Practical Takeaways Calibrate your risk tolerance to organizational context. What’s acceptable varies by industry, regulatory environment, and business model. Communicate risk clearly: specific scenario, realistic likelihood, business impact, cost to address. Give decision-makers what they need. Prioritize based on business impact and realistic threat, not just technical severity scores. Context matters more than CVSS numbers. Maintain a risk register of identified risks—addressed, accepted, or pending. This guides future prioritization, shapes tool and project planning, and provides organizational awareness of what you’re carrying. Celebrate incremental progress. Going from bad to less bad is still improvement worth acknowledging. Know when to fight and when to move on. Not every battle is worth having. Wisdom is knowing which ones matter. Take care of yourself. Find a way to care about the work without letting unresolved risks consume you. Imperfect security that’s continuously improving beats the impossible pursuit of perfection. The post Week 3: Fort Knox Isn’t the Goal: Learning to Live with Imperfect Security appeared first on Cultivating Security.
-
7
Week 2: Understanding Your Environment Before You Try to Secure It
You can’t protect what you don’t know exists. That should be obvious. But based on how most security programs operate, it apparently isn’t. People want to jump straight to the interesting work. Threat hunting. Incident response. Penetration testing. Red team exercises. And sure, that stuff matters. But if you don’t have a solid grasp of what’s actually in your environment—what systems exist, how they’re connected, who has access to what, where the data lives—you’re building security controls on top of quicksand. Asset inventory isn’t sexy. Documentation isn’t exciting. Mapping out authentication flows and data paths doesn’t feel like real security work. But without that foundational knowledge, everything else you do is guesswork. The Problem Nobody Wants to Acknowledge Most organizations don’t actually know what they have. Oh, they think they do. There’s a CMDB somewhere. There’s documentation. There are diagrams that got made three years ago when the environment looked completely different. There’s tribal knowledge locked in the heads of people who’ve been there forever. But when you actually start digging, you find systems that aren’t documented. Cloud resources that got spun up for a project and never decommissioned. Service accounts that nobody remembers creating. Shadow IT that’s running critical business processes. Integrations between applications that aren’t captured anywhere. The documentation lies. Not intentionally—it just drifts. Environments change faster than documentation gets updated. People leave and take their knowledge with them. Projects get implemented without updating the architecture diagrams. Exceptions become permanent without anyone acknowledging it. So you end up with a gap between what you think your environment looks like and what it actually looks like. And that gap is where security failures hide. Why This Happens Partly it’s because maintaining accurate asset inventory is tedious, unglamorous work. It’s much more fun to deploy a new security tool than to verify that your CMDB reflects reality. Partly it’s because environments are dynamic. In the days when infrastructure was mostly physical and changes happened slowly, you could maintain reasonably accurate documentation. Now, with cloud environments where resources get created and destroyed programmatically, with containerized applications that scale automatically, with SaaS integrations that happen outside IT’s visibility—keeping up is genuinely hard. Partly it’s because organizations don’t treat this as a priority until something breaks. Asset inventory doesn’t prevent breaches in an obvious, measurable way. It’s foundational capability that enables everything else, but that value is indirect and easy to overlook. And partly it’s because the people who understand the environment best—the engineers and administrators who actually run things day-to-day—are usually too busy keeping things operational to document what they know. What You’re Actually Trying to Understand Asset inventory sounds like it’s about making a list. It’s not. It’s about understanding your environment well enough to make informed security decisions. You need to know what systems exist, but you also need to know what they do. An undocumented web server is a problem. An undocumented web server that’s processing customer payment data is a much bigger problem. Context matters. You need to know how things are connected. Not just network topology—authentication flows, data flows, trust relationships, dependencies. When something breaks or gets compromised, what else gets affected? You can’t answer that if you don’t understand the relationships. You need to know who has access to what. Not just in terms of user accounts—service accounts, API keys, federated access, third-party integrations. Identity sprawl is real, and most organizations have only a vague sense of the access that’s been granted over time. You need to know where sensitive data lives. Not just primary storage—backups, logs, development environments, analytics platforms, third-party services. Data goes places you don’t expect, and if you don’t know where it is, you can’t protect it appropriately. You need to know what’s internet-facing versus what’s internal. What’s in scope for compliance versus what isn’t. What’s critical to business operations versus what’s nice-to-have. These distinctions shape your security priorities. These aren’t theoretical concerns; I’ve seen this play out in painful ways. A team confidently told auditors they had production data and one backup. Straightforward. Six months later, urgency around a high-profile industry breach prompted deeper digging, and the same team now mentioned three backups. Actual investigation found twenty copies of the production database scattered across prod, dev and test environments—some with proper data protection, some with half-implemented controls, some with custom “security” that was trivial to bypass. None of it was documented. None of it was in the asset inventory. The team wasn’t lying—they genuinely didn’t know the full extent of what existed. The Discovery Process (It’s Never Done) If your organization doesn’t have good asset inventory, you can’t just fix it all at once. This is incremental work. And during discovery, resist the urge to fix things. You’re going to find issues that make you want to immediately remediate. Don’t. Not yet. Your job right now is to understand what exists, not to judge it or fix it. Document what you find neutrally. That server running an unsupported OS? Note it. That hardcoded credential? Note it. That undocumented integration? Note it. Once you have a complete picture, patterns emerge. What looked like a critical issue in isolation might be lower priority when you see it in context. What seemed manageable might be part of a systemic problem. You can’t prioritize effectively until you know the full scope. Discovery first, judgment later, remediation last. (Obviously if you discover active malicious activity or an ongoing breach, that’s different. But technical debt and configuration issues can wait until you understand the landscape.) Start with what you can see easily. Network scans give you IP addresses and open ports. Cloud provider consoles show you what’s running in your AWS or Azure environments. Your CMDB or asset management system—however outdated—gives you a baseline. Then start filling in the gaps. Talk to the people who actually run things. The network team knows about VPN concentrators that aren’t documented. The database administrators know about legacy systems that “can’t be touched.” The developers know about temporary integrations they built to solve a business problem two years ago, but never got around to making it ‘formal’. But don’t just ask where things are—ask about the exceptions. “Where do VoIP phones live?” might get you “on the phone VLAN.” “Are there any exceptions?” reveals “oh yeah, the branch office, we never got around to creating the phone VLAN there.” The standard answer tells you the design. The exceptions tell you the reality. Always ask both. Map out authentication flows. How do users log into things? Where does SSO apply, where does it not apply? Where are there standalone authentication systems? What service accounts exist and what do they have access to? This is harder than it sounds, especially in organizations that have grown through acquisition or have a lot of legacy applications. Trace data flows. Where does customer data originate? Where does it get processed? Where does it get stored? Where does it get backed up? What third parties have access to it? This is critical for both security and compliance, and most organizations have only a partial picture. Document exceptions and technical debt. That server running Windows Server 2012 that “can’t be upgraded because the vendor doesn’t support the new OS.” That application with hardcoded credentials because “that’s how it was built ten years ago.” Undocumented systems that should have been decommissioned years ago but are still running critical processes. These things exist, and pretending they don’t doesn’t make them go away. And this isn’t one-time work. Automated discovery helps. Configuration management databases help. But keeping this information current requires ongoing discipline. Changes need to be documented. Exceptions need to be tracked. Drift needs to be caught and corrected. This is where a risk register becomes essential. Not as bureaucracy, but as a practical tool. As you discover and validate technical debt and security gaps, capture them in one place: what the issue is, when you discovered it, when it became a risk (if different), and enough context to understand it later. That timeline matters. Finding an EOL Windows Server 2012 instance in May 2025 tells you it’s been unsupported since October 2013—this is long-standing technical debt, not a new problem. Those timestamps become valuable later when you’re demonstrating progress. “We remediated a risk that existed for six years” tells a better story than “we fixed a server.” It shows you’re addressing real organizational debt, not just checking boxes. Don’t worry about formal risk assessments yet. Just capture what you’re finding: the issue, when you discovered it, when it became a risk, and your observations in a notes field. Known facts, context, anything that’ll help you understand it later. If you want to add a quick severity estimate, fine. But the priority right now is getting visibility into what exists, making it become Known. You can formalize scoring and prioritization later when you’re ready to decide what needs to get fixed first. Why Security People Often Skip This Because it feels like IT work, not security work. New security practitioners want to do security things. Hunt threats. Respond to incidents. Test for vulnerabilities. Asset inventory feels like something someone else should handle. But here’s the reality: if you don’t understand your environment, your security work is built on assumptions that might be wrong. You’re scanning for vulnerabilities in systems you know about, but the unpatched server nobody told you about isn’t getting scanned. You’re monitoring authentication logs for known systems, but the shadow IT application isn’t in your visibility. You’re enforcing access controls based on documented integrations, but the undocumented API connection is bypassing those controls. You can’t secure what you don’t know exists. And in most organizations, there’s a lot that isn’t known. The Business Case (Since You’ll Need One) When you ask for time and resources to improve asset inventory and documentation, management is going to want to know why it matters. Incident response is the obvious answer. When something goes wrong, you need to know what’s affected, what it connects to, who has access, and where the data is. If you’re figuring that out during an active incident, you’re behind. Good asset inventory means you can scope and respond faster. Compliance is another lever. Most frameworks require asset inventory. You can’t demonstrate that you’re protecting data appropriately if you don’t know where the data is. Auditors will ask for this, and “we don’t really have a current inventory” is not an answer they accept. Risk management depends on understanding what you have. You can’t assess risk accurately if you don’t know what assets exist and what they’re used for. Your risk register is fiction if it’s based on an incomplete understanding of the environment. Here’s something management often doesn’t understand: auditors actually give you more credit for a comprehensive risk register than a short one. Leadership sometimes thinks documenting risks makes the organization look bad—airing dirty laundry. The opposite is true. A risk register with 50, 100, 200 items tells auditors you have awareness of your environment and your gaps. You know that Server 2012 instance exists, you know why it hasn’t been upgraded, you’re tracking it. Auditors have seen technical debt before. They get it. What makes auditors nervous is a risk register with twelve entries and management acting like that’s comprehensive. That tells them you don’t actually know what’s in your environment, and they’re going to start digging to find what you’ve missed. A well-maintained risk register demonstrates maturity, not weakness. Vulnerability management doesn’t work without asset inventory. You can’t patch what you don’t know about. You can’t prioritize remediation if you don’t understand what systems are critical. You can’t measure progress if your baseline is wrong. Change management and operational stability benefit from accurate documentation. Understanding dependencies means you can predict what breaks when you make changes. Knowing what’s running where means you can plan maintenance windows appropriately. But honestly, the real reason is simpler: you can’t do security work effectively if you don’t understand what you’re securing. Everything else builds on this foundation. What Good Looks Like (It’s Still Imperfect) Even mature organizations don’t have perfect asset inventory. Environments are too dynamic, change happens too fast, people make mistakes. But good organizations have processes that catch drift. They have automated discovery tools that run regularly. They have change management processes that require documentation updates. They have accountability—someone owns the CMDB, and someone cares whether it’s accurate. They know what their crown jewels are and make sure those are documented thoroughly. They might not have perfect visibility into every development environment, but they know exactly what’s in production and what’s handling sensitive data. They treat documentation as an operational requirement, not a nice-to-have. When something changes, the documentation changes. When exceptions are granted, they’re tracked. When new systems are deployed, they get added to inventory before they go live. They have multiple sources of truth and reconcile them. The CMDB, the cloud provider inventory, the network management system, the vulnerability scanner—these should generally agree, and when they don’t, someone investigates why. And they accept that this is ongoing work. Asset inventory isn’t a project you complete. It’s a continuous process that requires attention and discipline. Starting From Where You Are If you’re in an organization with poor asset inventory, you’re not going to fix it overnight. This is a multi-month effort, minimum. Possibly multi-year if the environment is large and complex. Pick a starting point that matters. Maybe it’s internet-facing systems, because those are the most exposed. Maybe it’s systems handling regulated data, because that’s what auditors care about. Maybe it’s authentication infrastructure, because identity is foundational to everything else. Document what you find, including the gaps. “We believe these are all the internet-facing systems, but we don’t have confidence in this list because X, Y, Z.” Being honest about what you don’t know is better than pretending you have complete visibility. Build relationships with the people who know things. The senior network engineer who’s been there fifteen years. The DBA who knows where all the data actually is. The DevOps folks who understand the cloud environment. They have knowledge that isn’t written down anywhere, and you need it. Automate what you can. Discovery tools, cloud inventory scripts, vulnerability scanners—use them. But don’t trust them completely. Automated tools find what they’re configured to look for. They miss things. Verify. Make incremental progress visible. “Last quarter we had 200 undocumented systems. This quarter we have 150.” Progress matters, even if you’re not done. And accept that you’re never really done. Environments change. New systems get added. Old systems get decommissioned (sometimes without telling anyone). This is ongoing work. The Payoff When you actually understand your environment, everything else gets easier. Incident response is faster because you know what’s connected and what’s affected. Vulnerability management is more effective because you’re patching systems that actually matter. Access reviews are possible because you know what access has been granted. Compliance is less painful because you can demonstrate what controls are in place. Your risk register becomes a working tool instead of compliance theater. You can prioritize what gets fixed based on actual impact, not just what’s most visible. You can show progress over time—risks remediated, technical debt reduced, gaps closed. Leadership can see the work you’re doing instead of just hearing that security is “working on things.” You can have informed conversations about risk instead of vague hand-waving. You can prioritize security work based on actual business impact instead of whoever yells loudest. You can make architectural decisions with confidence instead of hoping you haven’t missed something critical. But more fundamentally, you can do security work that actually makes sense for your organization. Not generic best practices that might not apply. Not vendor recommendations that assume perfect visibility. Security that’s grounded in the reality of what you’re actually trying to protect. That’s worth the unglamorous work of figuring out what you have. Practical Takeaways Start with what’s most critical to the business or most exposed to risk. You can’t document everything at once, so prioritize. Use multiple sources and reconcile them. No single tool or database has complete truth. Cross-reference and investigate discrepancies. Talk to people who actually operate the environment. Documentation is never complete. Tribal knowledge is real, and you need access to it. Build and maintain a risk register as you discover gaps. Capture what you find, when you found it, when it became a risk, and context. Don’t judge or try to fix during discovery—just document neutrally. Document what you don’t know as explicitly as what you do know. Gaps in visibility are themselves important information. Build processes that keep information current. Asset inventory is continuous work, not a one-time project. Make it someone’s job. If nobody owns this, it doesn’t get maintained. Accountability matters. Accept imperfection but aim for constant improvement. You’ll never have perfect visibility, but you can always have better visibility than you did last quarter. Security work starts with understanding what you’re securing. Everything else builds on that foundation. Get this right first. The post Week 2: Understanding Your Environment Before You Try to Secure It appeared first on Cultivating Security.
-
6
Week 1: Introduction: Foundations That Nobody Teaches
There’s a gap in how people learn security work. Not a small one. You can get certified six ways from Sunday. You can read every framework document NIST ever published. You can know the OWASP Top 10 backwards and forwards. And you’ll still walk into your first real security role completely unprepared for how the work actually functions. Because nobody teaches you the organizational part. The political part. The part where your technically perfect solution dies in a budget meeting. The part where you discover that half your environment isn’t documented, a quarter of it is running on systems that haven’t been patched in two years, and everyone just… works around it. Nobody teaches you how to prioritize when everything looks critical. How to communicate risk to people who don’t think in terms of attack vectors. How to build security in organizations where you’re not the one making decisions. How to do the job well in environments that are messier than any textbook ever acknowledged. This series is about filling that gap. Who This Is For This is written for people with roughly 1-5 years in IT or security. You understand the technical fundamentals. You know what authentication means, how logging works, what APIs do, how cloud environments function. You’re not looking for “Security 101” content. What you’re looking for—whether you know it yet or not—is how to operate effectively in actual organizations. How to navigate the friction between what should happen and what’s actually possible. How to develop the judgment that separates people who know security from people who can actually get security work done. If you’re earlier in your career than that, some of this might not land yet. That’s fine. Bookmark it and come back when you’ve seen enough organizational reality to recognize what’s being described. If you’re later in your career, you’ve probably learned most of this already—the hard way. Maybe you’ll still find value in seeing it articulated clearly, or maybe you’ll just nod along and think “yeah, that tracks.” What This Series Covers Twelve topics, published weekly, in a deliberate sequence: Understanding Your Environment Before You Try to Secure It — why visibility and asset knowledge are foundational, not optional. Fort Knox Isn’t the Goal — learning to manage risk instead of eliminating it, and why your risk tolerance is probably miscalibrated. The Logging and Visibility Problem No One Mentions — the gap between what you think you can see and what you actually can see, especially in SaaS. The Identity Sprawl Problem — why identity is the real perimeter now and why it’s so damn hard to manage. Vendor Relationships Aren’t Partnerships — how to assess vendor risk beyond security questionnaires and why “we take security seriously” means nothing. Reporting to IT: How to Build Security When You’re Not in Charge — strategies for security practitioners working under non-security leadership. Why Security Projects Fail (And It’s Usually Not Technical) — the organizational and political dynamics that kill initiatives before they start. Reading the Room: What Your CISO Actually Cares About — translating technical risk into business language and understanding executive constraints. Compliance Is Not Security (But You Still Have to Care) — how frameworks actually work and how to use them without letting them define your entire program. When ‘Best Practices’ Don’t Apply — making intelligent trade-offs when reality prevents textbook implementations. Incident Response Is Half Politics — the organizational dynamics of actual incidents and why your IR plan won’t survive first contact. Learning from Incidents You Didn’t Have — building pattern recognition from public breaches without becoming paralyzed by threat awareness. The first four posts establish reality: what environments actually look like, how to think about risk, and the visibility and identity challenges that underpin everything else. The middle section covers organizational navigation: vendors, reporting structures, project failure modes, and communication. The final posts address judgment and crisis: compliance frameworks, adapting best practices, handling incidents, and learning from external events. Each piece stands alone. But they build on each other. Concepts introduced early get referenced later when they become relevant in new contexts. What This Series Isn’t This isn’t vendor-neutral tool reviews. This isn’t certification prep. This isn’t step-by-step technical tutorials. This isn’t going to tell you how to configure a SIEM or write detection rules or implement zero trust architecture. There are other resources for that, and many of them are quite good. This is about the stuff that matters just as much as technical skills but rarely gets explained clearly: how to operate in imperfect environments, how to communicate effectively with people who don’t speak security, how to prioritize when resources are finite, how to build credibility so that when you ask for something you actually need, people listen. It’s about developing the organizational literacy and pattern recognition that usually takes a decade of painful experience to acquire. The Approach Everything here is grounded in real-world practice. Not theory. Not aspiration. Not what the white papers say should happen. The perspective comes from someone who’s been doing this long enough to have lived through the failures, the vendor surprises, the incidents, the organizational friction, the budget fights, and the slow grind of actually building security programs in environments that weren’t designed for it. This isn’t an exhaustive treatment of every topic. It won’t cover every nuance or edge case. It’s the things I wish someone had explained to me earlier in my career—or maybe they did try to explain them, but I wasn’t ready to hear it yet. Sometimes the lesson doesn’t land until you’ve seen enough to recognize what it means. These are the patterns and dynamics that took me years to understand, laid out in ways I hope will click faster for you. One More Thing Security work is hard. Not just technically hard—organizationally hard. You’re going to face situations where you know what should be done, and it’s not going to happen. You’re going to raise risks that don’t get addressed. You’re going to watch decisions get made that you disagree with. Learning to do this work well means learning to operate effectively in that reality without burning out or becoming cynical. The people who last in this field are the ones who figure out how to care deeply about the work while accepting that progress is incremental, resources are finite, and perfection is impossible. That’s a hard balance to strike. But it’s a necessary one. This series won’t make the work easier. But it might help you understand it better. We’ll start next week with the foundation everything else builds on: understanding your environment before you try to secure it. The post Week 1: Introduction: Foundations That Nobody Teaches appeared first on Cultivating Security.
-
5
When Your Vendor Drops a Security Layer (And Doesn’t Tell You)
Back in November, there was a piece on KrebsOnSecurity about the Cloudflare outage — particularly companies that chose to bypass Cloudflare entirely to get their services back online. I wrote an internal analysis / lessons learned and sent it to my IT Peers on it at the time. Over the past month it’s come up in a few conversations, and this week while working on some blog posts for January, it surfaced again. There’s an angle here I think got missed in the initial coverage. So here’s my read on it, with a month of distance. (Cloudflare published a detailed post-mortem of the incident, which is worth reading for the technical depth. What follows is not about Cloudflare’s response — which was transparent and thorough — but about what happened downstream when companies and SaaS vendors chose to bypass Cloudflare entirely during the outage.) The Operational Decision That Made Sense Operationally, bypassing Cloudflare made sense in the moment. Website’s down, customers are waiting, business is bleeding. Route around the problem and get back online. Nobody’s going to argue with the urgency. According to reporting from KrebsOnSecurity, there was roughly an eight-hour window when several high-profile sites decided to bypass Cloudflare for the sake of availability. Some companies were able to pivot away temporarily; others couldn’t because their DNS was also hosted by Cloudflare or because the Cloudflare portal itself was unreachable. But here’s what kept sticking with me: Cloudflare wasn’t just a CDN or performance layer for a lot of these companies. It was a significant part of their defense-in-depth strategy. And here’s the critical nuance that I think matters: Cloudflare isn’t actually a single layer of defense-in-depth. It’s a concentration of multiple security controls delivered through a single platform. What Actually Got Removed When you pulled Cloudflare out of the path — even temporarily — you didn’t just remove “a layer.” You removed several interacting controls at once: DDoS mitigation (L3/L4/L7) Bot management Rate limiting WAF rule enforcement Request normalization and sanitization TLS termination and policy enforcement IP reputation filtering Geo-based access controls Abuse and anomaly detection If you didn’t have a mature, well-tuned WAF of your own sitting behind Cloudflare — and more importantly, if you didn’t have comparable controls for rate limiting, bot detection, IP reputation, and request scrubbing — you may have just exposed yourself to multiple attack vectors simultaneously. Attack vectors that had been quietly mitigated for years, to the point where you forgot they existed. As Aaron Turner from IANS Research pointed out to KrebsOnSecurity: “Your developers could have been lazy in the past for SQL injection because Cloudflare stopped that stuff at the edge. Maybe you didn’t have the best security QA for certain things because Cloudflare was the control layer to compensate for that.” That’s the risk of outsourcing security controls without understanding what you’re outsourcing. Those controls compound each other. They’re designed to work together. Losing them together is far more dangerous than losing a single, isolated control. And here’s an important nuance from Cloudflare’s post-mortem: the outage impacted their Bot Management system and caused widespread HTTP 5xx errors across their core CDN and security services. But not all of Cloudflare’s protections failed at the same time. When vendors bypassed Cloudflare entirely to restore service, they weren’t just removing failed protections — they were removing all of Cloudflare’s protections, including DDoS mitigation, WAF rules, and rate limiting that were still functioning. The Questions That Should Have Been Asked So the question becomes: did anyone pause to think about that in the moment? Or did they just act? Did Security have a say in the decision to bypass? Did they understand they were dropping multiple layers of protection at once? Did they have equivalent controls ready to absorb the gap — not just a WAF, but rate limiting, bot detection, abuse monitoring, and more? Or was the decision made in a war room where Security wasn’t even present? The Structural Problem at Smaller SaaS Vendors Or — and I think this is closer to reality for a lot of SaaS vendors — was there no separation between the person making the operational decision and the person responsible for security? Because here’s the thing: SaaS vendors come in all shapes and sizes now. DevOps, DevSecOps, small engineering teams where the “senior” engineer is also the security person. Or even smaller vendors with outsourced vCISOs who aren’t involved in real-time operational decisions at all. When the person responding to an outage is wearing both the operations hat and the security hat, where does their mind default to under pressure? Can they even think about security and operations simultaneously in that moment? Or does “get the service back online” override everything else because that’s the immediate, visible, measurable problem in front of them? What Probably Actually Happened I suspect in a lot of cases, Security wasn’t bypassed because someone made a conscious risk decision. Security was bypassed because the person making the call didn’t have the bandwidth — or the organizational structure — to think about it as a security decision at all. It was purely operational. And here’s the kicker: a lot of those vendors probably never did an internal RCA on their own actions. The root cause was “Cloudflare outage” — pointed finger, case closed. They never analyzed the downstream implications of their bypass decision. They may not have even realized they collapsed multiple layers of defense-in-depth — those layers had been invisible to them all along. That’s not malicious. That’s just reality for a lot of smaller SaaS providers operating without dedicated security staff or mature incident response processes. But it doesn’t change the risk their clients just inherited. The Pressure Is Real, But So Are the Implications I’m not being naive here. Business continuity pressure is real. Uptime SLAs are real. Executive pressure during an outage is very real. But bypassing a constellation of critical security controls without understanding the downstream implications is exactly how well-intentioned decisions introduce avoidable risk. And I suspect in many cases, the decision wasn’t “we understand the risk and we’re accepting it.” It was “get the site back up” — and nobody stopped to think about what they were removing in the process, because there was nobody whose job it was to stop and think about it. This is a reminder that defense-in-depth only works if every layer is understood, maintained, and included in decision-making during an incident. And when those layers are consolidated into a single vendor platform, it’s easy to forget just how much you’re actually relying on until it’s gone. The SaaS Vendor Problem But here’s the part that really stuck with me — and the reason this kept surfacing over the past month. What about the SaaS providers who also use Cloudflare — especially the ones serving critical industries like Financial Services? Did they bypass Cloudflare to restore service? Almost certainly some did. Did they communicate that decision to their clients? Did they explain the risk they were accepting on behalf of those clients? Or were customers never informed that multiple major security controls — controls they assumed were present because they had been for months or years — were suddenly bypassed during an outage window? Here’s where it gets even messier: a lot of SaaS vendors don’t explicitly tell their clients they’re using Cloudflare. They just list the security controls in their sales deck or RFP response: “We have DDoS protection. We have WAF enforcement. We have bot mitigation. We have rate limiting. Robust security.” And they do — because Cloudflare is providing it. But the client doesn’t necessarily know that. They assume those controls are baked into the vendor’s architecture. They breeze through due diligence because the vendor checked all the boxes. The security questionnaire gets approved. The contract gets signed. And then one day, the vendor bypasses Cloudflare to restore service during an outage — and suddenly, those security controls the client thought were intrinsic to the platform? Gone. Temporarily, maybe. But gone. The client had no idea those capabilities were outsourced. They had no idea they were dependent on a third-party service. And they had no idea that “restoring service” meant removing DDoS protection, WAF enforcement, and bot mitigation all at once. That’s the piece that keeps surfacing in conversations. Cloudflare themselves were transparent about the November outage — publishing a detailed post-mortem with root cause analysis, timeline, and remediation steps. That’s the kind of communication you’d expect from a mature infrastructure provider. But that transparency doesn’t extend to what their customers did in response. Did those SaaS vendors communicate to their clients that they bypassed Cloudflare? That they temporarily removed the security controls they’d been relying on for months or years? Most likely, no. The Visibility Gap Financial institutions perform vendor due diligence under the assumption that their SaaS providers’ architectures remain stable unless there’s a formal change, a review cycle, or some kind of communication (an expectation reinforced by regulatory guidance on third-party risk management). If a vendor quietly removes DDoS protection, WAF enforcement, bot mitigation, and rate limiting all at once during an incident, that changes their risk profile immediately. But unless the vendor is transparent enough to say it out loud, the client has no visibility into that decision. And here’s the longer-term question: What happens if, in the coming weeks or months, one of these SaaS vendors announces a breach? What if the root cause turns out to be traceable back to mid-November — to the days or hours where they bypassed Cloudflare and exposed themselves in ways they hadn’t in years? Will anyone connect the dots? Or will it be one of those hindsight moments where someone finally realizes: “Oh. That’s how they got in. We dropped DDoS protection, rate limiting, and bot detection all at once to restore uptime, and an attacker walked right through the gap.” And attackers were watching. As Turner told Krebs: “Let’s say you were an attacker, trying to grind your way into a target, but you felt that Cloudflare was in the way in the past. Then you see through DNS changes that the target has eliminated Cloudflare from their web stack due to the outage. You’re now going to launch a whole bunch of new attacks because the protective layer is no longer in place.” According to Cloudflare’s timeline, the outage lasted roughly 5.5 hours, with severe impact for about 3 hours. But the window where companies bypassed Cloudflare was reportedly around 8 hours. If a SaaS vendor bypassed Cloudflare around noon UTC and waited until evening to restore it (to be safe), that’s potentially a 6-8 hour window where they were operating without DDoS protection, WAF enforcement, bot mitigation, and rate limiting. Six to eight hours is a long time for an attacker scanning for newly exposed infrastructure. And this was a highly publicized incident happening in real-time, not a silent configuration change. As Turner told Krebs, attackers tracking specific targets could see through DNS changes the moment Cloudflare was removed — and immediately launch attacks they’d been planning but couldn’t execute while Cloudflare was in the way. Cloudflare’s post-mortem explicitly states the outage “was not caused, directly or indirectly, by a cyber attack or malicious activity of any kind.” It was a technical failure in their Bot Management system. But when SaaS vendors bypassed Cloudflare in response to that technical failure, they may have inadvertently created a security exposure that could be exploited by actual attackers. That’s the risk of making operational decisions under pressure without thinking through the security implications. The Liability Question Nobody’s Asking And then there’s the question that really interests me: where does the liability land? If a client opened a support ticket demanding the service be restored immediately, does that constitute implicit approval to drop security controls? If it’s documented in the ticket — “customer requested immediate restoration” — does that shift accountability back to the client? Or is the vendor still responsible for explicitly communicating the trade-off: “We can restore service, but it means removing DDoS protection, WAF enforcement, bot mitigation, and rate limiting. Do you accept that risk?” I’m not a lawyer, but I’d be fascinated to see how this plays out in litigation if it ever gets there. How would a plaintiff’s attorney position this? How would defense counsel respond? Because we’re already seeing a shift in how breach litigation unfolds. It’s not just the company that got breached anymore — it’s the company and their SaaS vendor, named together in the lawsuit. The argument being: the vendor was responsible for securing the data, the client was responsible for choosing and overseeing the vendor, and both failed. If a breach traced back to a Cloudflare bypass during this outage, you could see arguments going both ways: Plaintiff’s side:“The vendor removed critical security controls without informing the client, materially changing the security posture the client relied on during due diligence and contracted for. The vendor failed in their duty to protect the data entrusted to them.” Defense (vendor) side:“The client demanded immediate service restoration. We documented the request. Restoring service required architectural changes. The client’s demand implicitly accepted the operational trade-offs necessary to meet their requirement.” Defense (client) side:“We requested service restoration, not security degradation. We had no visibility into the vendor’s architecture. We were never informed that ‘restore service’ meant ‘remove DDoS protection and WAF enforcement.’ The vendor should have communicated the security implications before acting.” I genuinely don’t know how that shakes out. But the ambiguity alone is a problem. Because right now, most SaaS contracts don’t clearly define who’s accountable when operational decisions during an incident materially change the security posture. And until we see some case law or regulatory guidance, both sides are operating in a gray area that could get very expensive to navigate after a breach. Two Different Kinds of Defense-in-Depth This incident highlights something we don’t talk about enough: there are actually two layers of defense-in-depth at play in modern infrastructure. First, there’s our own defense-in-depth — the controls we design, deploy, maintain, and fully understand. If we remove one of our layers (or several at once), we understand the risk and can control the compensating actions. Second, there’s the defense-in-depth of the SaaS providers who store or process our customer data. They are effectively an extension of our infrastructure. Their architectural decisions directly impact the security of our data. The Authority vs. Responsibility Gap But here’s the problem: we don’t control their day-to-day decisions. We’re not in the room when they make operational trade-offs during an outage. We perform due diligence at onboarding and during periodic reviews, but we don’t have decision authority over what they do in the moment. So when a SaaS provider modifies or bypasses multiple security controls to restore service, the downstream exposure shifts directly to us. We’re still accountable for the data we’ve placed in their platform, but we have no practical way to influence or halt the decision they made in real time. That’s the inherent challenge with the SaaS model: we carry the responsibility, but not the decision authority. And in that space — yes — it becomes an unavoidable form of blind trust, simply because the model offers no other option unless the vendor communicates proactively. What This Means Practically If You’re a SaaS Provider If you provide SaaS services to regulated industries or handle sensitive data on behalf of clients, this outage should be a forcing function for how you think about transparency during incidents. When you make architectural changes under pressure (bypassing a CDN, turning off a WAF, relaxing rate limits, disabling bot protection — whatever the combination), your clients need to know. Not three months later in a compliance report. Not after a breach investigation uncovers it. In the moment, or as close to it as possible. Because the risk you’re accepting isn’t just yours. It’s theirs too. If You’re a Client of SaaS Vendors If you rely on SaaS vendors to protect critical data, this is a reminder that due diligence can’t stop at onboarding. You need ongoing visibility into how your vendors operate during incidents. You need contract language that requires transparency around security posture changes — especially when multiple controls are bypassed at once. You need to ask the uncomfortable questions about what happens when something breaks and they need to route around their own protections. Because “we trust our vendor” is not a control. It’s a hope. And hope is not a strategy. The Questions You Should Be Asking Nicole Scott from Replica Cyber called the Cloudflare outage “a free tabletop exercise, whether you meant to run one or not.” She’s absolutely right. Whether you’re a SaaS provider or a client of SaaS vendors, this outage should prompt some uncomfortable internal questions: What was turned off or bypassed (WAF, bot protections, geo blocks), and for how long? What emergency DNS or routing changes were made, and who approved them? Did people shift work to personal devices, home Wi-Fi, or unsanctioned SaaS providers to get around the outage? Did anyone stand up new services, tunnels, or vendor accounts “just for now”? Is there a plan to unwind those changes, or are they now permanent workarounds? For the next incident, what’s the intentional fallback plan, instead of decentralized improvisation? If you bypassed Cloudflare during this outage and you can’t answer these questions, you’re not ready for the next one. Final Thought I don’t know if any breaches will surface in the coming weeks or months that trace back to this outage window. I genuinely hope they don’t. But the possibility is real enough that it’s worth thinking through now — not after the fact, when you’re sitting in an incident review trying to figure out how an attacker got in through a gap that didn’t exist in early November. Defense-in-depth only works if you actually know what the layers are, who controls them, and what happens when several of them disappear at once. If a SaaS vendor is carrying your DDoS protection, WAF enforcement, bot mitigation, rate limiting, and abuse detection — and you didn’t even know it — that’s not defense-in-depth. That’s someone else’s architecture that you’re depending on without visibility or control. And when that architecture changes during an outage, you inherit the risk whether you knew about the change or not. This outage didn’t create that problem. It just made it visible. The post When Your Vendor Drops a Security Layer (And Doesn’t Tell You) appeared first on Cultivating Security.
-
4
Security Third: Why “Security First” Makes Organizations Less Secure
I heard something on a podcast the other day that’s been rattling around in my head ever since. The hosts were talking about Mike Rowe’s “Safety Third” concept — the idea that safety matters, sure, but treating it as the absolute top priority above everything else can actually make you less safe. Not because safety doesn’t matter, but because the “Safety First” mantra creates complacency. It makes people think someone else is responsible for their wellbeing. It replaces common sense and personal awareness with compliance theater. And listening to them explain it, I realized we’ve built the exact same problem in information security. The idea: declaring safety the absolute top priority makes people complacent. They stop thinking critically. They assume someone else made everything safe. That’s when accidents happen. That’s exactly what we’ve done with “Security First”. You hear it everywhere. Microsoft says “security comes first when designing any product or service” and tells employees “if you’re faced with the tradeoff between security and another priority, your answer is clear: Do security.” AWS states “cloud security at AWS is the highest priority.” Meta claims “safeguarding your data is our highest priority.” And just like the safety banners Rowe saw before being asked to do something dangerous, these declarations create a problem: they make people believe someone else is responsible for security. Over time, organizations become convinced that because they’ve said security is first, because they’ve implemented the tools and policies and compliance frameworks, they’re actually secure. They stop looking both ways. They trust that if the process allowed it, it must be safe. And that’s when things go wrong. Now look — for companies like Microsoft and AWS, maybe “security first” actually makes sense. They’re becoming the world’s infrastructure. Their product is security and availability. But the rest of us? The manufacturers, financial institutions, retailers, healthcare systems? Our business isn’t security. Our business is making things, moving money, serving customers, treating patients. Security enables that mission. It doesn’t replace it. And yet we keep demanding that security be first. We push for CISOs on boards. We complain that only 12% of S&P 500 companies have board directors with cyber credentials, that 19% of Fortune 500 companies don’t have a CISO. We point out that when Krebs on Security looked at the Fortune 100, only five companies listed a security professional on their executive leadership pages. We act like the problem is that security doesn’t have enough authority, enough budget, enough executive visibility. But what if that’s backwards? Whether you’re at a Fortune 500 with a CISO reporting to the board or a regional company where security reports to IT, the pattern is the same. What if demanding “security first” and a seat at the executive table is actually the problem—not because security doesn’t matter, but because it makes security someone else’s job? The CISO’s job. The board’s job. The security team’s job. Not everyone’s job. Maybe we should stop pretending security can actually be first — and start admitting that Security Third is closer to how this really works. The Complacency Problem Here’s what Rowe noticed on Dirty Jobs: he kept hearing “your safety is our top priority” right before someone asked him to do something objectively dangerous. Walk up a suspension bridge cable. Test a shark suit. Climb into a bosun’s chair hundreds of feet up. And over time, he and his crew started believing it. They started trusting that someone else had made everything safe for them. They stopped looking both ways before crossing the street because the sign said it was safe to cross. That’s when people got hurt. We’ve done the same thing in infosec. I’ve sat in meetings where leaders pointed at me and said “he makes us secure.” I’ve been in new hire orientations where I ask who’s responsible for information security, and the whole room points at me and my team. Once in a while, maybe once a year, someone will say “we all are” — and that’s the only right answer. I’m accountable for the security program. I build the framework, own the tools, manage the team. But every single employee is responsible for implementation. And yet somehow, despite years of trying different approaches, I still hear “he makes us secure.” I hear it from auditors. I hear it from business units in meetings with vendors. I hear it in leadership meetings. Every time I do, I know I’ve failed. Not because the program isn’t working, but because the entire world keeps saying “Security First” — which translates in people’s minds to “security equals the InfoSec team’s job.” I can stand in new hire orientations all day explaining that security is everyone’s responsibility, but I’m drowned out by every breach notification letter, every vendor pitch, every industry article reinforcing that security is handled by the security people. It’s the exact complacency Rowe warned about — his production crew stopped being careful because someone else on the jobsite was responsible for safety. And the industry reinforces this everywhere. Tools and policies and training modules (often terrible, hour-long videos that feel like OSHA’s greatest hits from 1987) that create the impression that if you follow the rules, you’re safe. If you’re in compliance, you’re protected. If you click the right buttons on the annual training quiz, you’ve done your part. Security is handled. But compliance isn’t the same as security. Never has been. The First Priority Fallacy Mike Rowe has a way of cutting through the noise on safety that applies perfectly to security. In a post explaining his “Safety Third” philosophy, he poses a question: Would you be OK if the government reduced the posted speed limits by 50%, required all motorists to wear helmets, and outlawed all left turns? If not, why not? Doing so would save almost 40,000 lives a year. His point? We’ve already come to terms with the human cost of driving the way we want to drive. We believe, collectively, that 40,000 annual deaths are an acceptable price to pay. We’ve made things safer with seat belts, airbags, and ABS brakes, but we haven’t done all we can to eliminate traffic fatalities. Nor will we. Because when it comes to driving, safety isn’t actually first—we’ve just decided it’s important enough to manage intelligently. Security works the same way. If security were actually first, businesses wouldn’t function. If we truly put security above all other considerations, we’d shut down internet connectivity, disable email, remove all cloud services, and send everyone home with a pen and paper. We’d achieve perfect security by doing absolutely nothing. Obviously, that’s absurd. But it exposes the lie in “security first.” What we actually mean is “security is important and needs to be considered in everything we do.” But that’s not as catchy, and it doesn’t fit on a banner. The problem with ranking values is that you end up with nonsense. If security is first, what’s second? Revenue? Customer service? Innovation? Employee satisfaction? And if those things conflict with security (which they will), do we always sacrifice them? Of course not. In the real world, we balance competing priorities every single day—just like we balance safety against the need to actually get somewhere when we get in our cars. I’ve sat in countless meetings where someone invoked “security first” to shut down a conversation—except I wasn’t in the room. Early in my tenure at one organization, an executive confronted me: why did I shut down a productivity tool their team desperately needed? I had no idea what they were talking about. I’d been there barely over a year and was still fighting to get invited to meetings. I pressed for details—not to assign blame, but because I needed to know what conversation had happened where I’d outright said no. There was no conversation. Security never said no. Someone in IT just assumed we would, and killed the idea before it could be evaluated. This is “security first” in action—used as a shield by people who didn’t want to implement another tool, didn’t want to take on the work, didn’t want the risk. Security became the convenient excuse. I finally convinced the executive to have their business leader approach me directly. They caught me in the hallway, gave me an elevator pitch. My response? “That sounds fantastic and like it would really help the team. Here are the areas I’d watch out for. Can you invite me to the next vendor meeting so we can run through this short list and figure out how to make this happen?” We implemented it. It worked. Fast forward five years. Same leader, very different scenario. They wanted to upgrade one of their customer-facing systems. This time, my team was on the project from the start. The test servers were built, and we started working through the implementation. That’s when we began peeling back the layers. First issue: we couldn’t get it to integrate with our authentication system. After troubleshooting, we discovered why—the application required LDAP communication over an unencrypted channel. No LDAPS support, no alternative. Then we looked at the logs. Unencrypted sensitive data everywhere—usernames, passwords, all of it in plaintext. As we dug deeper, we found the tech stack was built on end-of-life .NET components that hadn’t been supported in three years, with no plans by the vendor to update. It was actually worse than the system currently in production. This “upgrade” was a years-long step backwards in technology and architecture. I didn’t say no. I wrote a risk assessment. Ranked it against our published risk tolerance statements. We sat down with this business leader and walked through what we’d found. She shut it down herself—because she realized the vendor had sold them six-year-old software disguised as a new upgrade. Did we lose time? Yes. Did we invest resources in building test environments only to discover it wouldn’t work? Absolutely. But as a project management leader told me recently, “We figured out it wouldn’t work, and we—the company—made the call.” That’s the difference. Security was part of the process, not shoved down anyone’s throat. We provided information. The business made the decision. People still say security shut it down. But ultimately, it was that business unit leader who made that choice based on the risks we helped her understand. That’s not security first. That’s security in its proper place—informing decisions, not making them. Here’s the thing: the business exists to do business. To make money, serve customers, deliver value. Security exists to enable that mission, not to replace it. When security becomes the top priority, you’re no longer running a business—you’re running a compliance department with a product attached. The Human Factor There’s another problem with “security first,” and it’s the same one Rowe identified with “safety first”: it absolves individuals of personal responsibility. When you tell people that security is someone else’s job—that the security team, or the tools, or the policies are what keep things safe—they stop thinking critically about their own actions. They trust the guardrails. They assume that if the system let them do something, it must be okay to do it. But humans are the biggest variable in any security program. Not because they’re stupid or malicious, but because they’re creative, resourceful, and focused on getting their jobs done. And when your security controls get in the way of that, they will find a way around them. A peer in my industry shared their environment with me: their organization doesn’t allow general web browsing. At all. No Google. No Yahoo. No search engines. Every single website must be explicitly whitelisted before anyone can access it. When I asked how their company researches new products or evaluates vendors, they just shrugged. Think about what that actually means. You’ve got knowledge workers who need to do their jobs, and you’ve made it impossible to do basic research on company equipment. So what happens? They take work home. They use personal devices. They email documents to personal accounts. They find workarounds—because the business still needs to get done. The organization thinks they’ve locked down their perimeter. In reality, they’ve just pushed their data outside the perimeter where they have zero visibility and zero control. They’ve made themselves less secure while feeling more secure. That’s the Fort Knox mentality—build the walls so high that your own people have to tunnel under them. I’ve seen variations of this everywhere. Recent research shows the problem is getting worse, not better. A 2024 CyberArk study found that 65% of employees bypass cybersecurity policies to boost productivity Help Net Security, while 74% said they would bypass cybersecurity guidance if it helped them achieve a business objective Help Net Security. The consequences are predictable: when organizations make security controls too restrictive, hospital technicians share login sessions to avoid five-minute authentication delays Medium, and 75% of information workers share corporate data via personal email and cloud accounts, jumping to 87% for senior managers Software Connect. Healthcare faces particular challenges, where healthcare professionals rely on mobile apps or cloud platforms to manage patient information without following IT security guidelines Reco, contributing to 90% of healthcare institutions experiencing at least one security breach in recent years. You can’t manage humans the way you manage infrastructure. You can’t firewall rule your way out of human behavior. And trying to do so—building tighter and tighter controls, more and more restrictions, deeper and deeper monitoring—doesn’t make you more secure. It makes you more brittle. Worse, it drives risky behavior underground where you can’t see it, can’t measure it, and can’t help mitigate it. The alternative isn’t to give up and let chaos reign. It’s to recognize that security works best when it enables work, not when it becomes an obstacle to overcome. Instead of asking “can we do this securely?” start asking “how can we do this securely?” That’s a subtle change, but it completely reframes the relationship between security and the business. You’re no longer the Department of No. You’re a partner in finding solutions. That’s what Security Third actually looks like in practice. Not security as an afterthought, but security as a realistic, balanced consideration alongside the other priorities that keep the business running. When security becomes something people work with instead of around, you’ve actually made your organization more secure. What Security Third Actually Means Let me be very clear: Security Third doesn’t mean security doesn’t matter. It doesn’t mean we should ignore risks or abandon controls or let compliance slide. It means we need to be honest about what security actually is. Security is not a state you achieve. It’s not a checklist you complete. It’s not something someone else does for you. Security is a practice. It’s a way of thinking. It’s a continuous process of understanding risk, making informed decisions, and staying aware of what’s actually happening in your environment. When I joined a company to build their first formal information security department, they already had some security components in place—mostly unmanaged, but they existed. One of those components was security training. During my own onboarding, I sat through an almost hour-long video on phishing and other security topics. It was painful. Think those OSHA safety videos from the ’80s and early ’90s—bad acting, outdated scenarios, the kind of thing that makes you want to check your watch every three minutes. I died a little inside watching it. And in that moment, building a real security awareness program jumped from somewhere in my top 20 priorities to my top 5. The first thing I did was kill that video. Instead, I built a presentation and showed up in person to every single new hire class. For almost five years, I personally delivered security awareness education to every new employee on their first day. We also had computer-based modules and other training components, but even to this day, during new hire orientation, one or two members of my team are in that room talking through security awareness in person. And every year for the last few years, our Training Department has tried to take my hour-long slot down to 30 minutes. Or 15 minutes. I get it—they’re trying to streamline onboarding, make it more efficient. So every year, we have a conversation about what we’re actually trying to accomplish. Is the goal to check a compliance box, or is it to actually change behavior? Because if it’s the former, cut me to 15 minutes and run a video. If it’s the latter, here’s why we need that hour. Every year, they’ve agreed that effectiveness matters more than efficiency. And every year, I’ve kept that hour. Because cutting it down and relying solely on computer-based modules is compliance theater. It’s not effective security awareness. It’s checking a box so someone can say “we did training.” That’s the “security first” mentality in action—treating security as a requirement to satisfy rather than a practice to build. Not because it scales better than a video—it doesn’t. But because it works better. When you’re in the room, it becomes a conversation instead of a compliance checkbox. People ask questions. They share concerns. They start thinking about security as part of their job, not as something the security team handles for them. They report suspicious emails not because they’re required to, but because they understand why it matters. That’s Security Third. It’s the recognition that security works best when it’s embedded in everything you do, not when it’s treated as the top priority that overrides all other considerations. It’s about being present, being practical, and being honest about what actually makes people more secure. The Organizational Implications Here’s where this gets uncomfortable for a lot of security practitioners: if security isn’t first, then we’re not the most important team in the organization. We’re not the final word on what does or doesn’t happen. We’re advisors, partners, enablers—not gatekeepers. And that’s hard to accept, especially in an industry that’s spent the last two decades trying to get a seat at the table, trying to get executive buy-in, trying to get budget and headcount and authority. We’ve fought to make security matter. And now I’m suggesting we need to step back? Not exactly. I’m suggesting we need to be honest about what matters and why. Security matters because the business matters. The customer data we hold matters. We protect things because those things have value. We manage risk because unmanaged risk threatens the mission. But we’re not the mission itself. This has real implications for how we build programs. It means we can’t design security architectures in isolation from business needs. It means we can’t implement controls just because they’re best practices or compliance requirements. It means we need to understand what the business is actually trying to do and figure out how to make that happen securely, not how to make it so secure it can’t happen at all. And here’s where I need to be honest: I haven’t figured this out. I’ve worked in organizations where security is “first” in theory but last in practice. Where every project timeline includes a buffer for “security delays.” Where business units hide initiatives from security until the last possible moment because they assume we’ll kill them. Where we’re perpetually fighting fires because we’ve lost visibility into what’s actually happening. Some parts are good, some parts are terrible. Others improve after a breach forces the conversation. Some are better because leadership genuinely gets it. It’s always a mixed bag. I’ve explained repeatedly that we need to be in the strategic planning sessions—not the big end-of-year kickoff where decisions are already made, but the early meetings where the business starts making tactical decisions about new markets, new initiatives, new directions. That’s where security needs a seat. Not to say no, but to start thinking ahead. If I know the business is moving into a new market or launching a new product line, I can start planning—mentally, architecturally, in my own year-over-year roadmap. What tools will we need? What knowledge should my team develop? What risks should we be ready to address? But more often than not, despite executives saying “we take our customers’ data security seriously,” I’m left out of those conversations. And I’m back in reactive mode: “Oh, this business unit just bought X? Oh crap. Okay, let me figure this out.” I still don’t have a perfect answer for how to fix this. In the last few years, it’s gotten better—but not because leadership suddenly decided security should be first. Other business units started complaining that they didn’t know what was happening across the organization. The contact center finds out sales launched something new when customers start calling about it. Marketing discovers a new product line exists when the contact center asked what sales did and to get the marketing literature for it. When leadership started fixing that broader communication problem, security got pulled into those earlier conversations too. But there are still areas where I find out after the fact, where I’m constantly chasing decisions that have already been made. If you’re reading this and you’ve made real strides on this—getting security the visibility and lead time to understand what’s coming, to gauge risk early, to identify the gotchas before they become project delays, to prepare the tools or start applying existing ones to the new initiative—I’d genuinely like to know how you did it. Share the book, the article, the framework, the conversation that worked. The rest of us are still figuring it out. Because here’s the thing: the earlier we know, the less we become a roadblock. When I find out a business unit bought a new SaaS tool the day before go-live, I am the delay—not because I want to be, but because we’re not ready. We haven’t evaluated it, we don’t know how it integrates, we haven’t figured out logging or access controls or data flows. But if I know six months out? We can work through all of that in parallel. Security stops being the thing that slows you down and becomes the thing that helps you move faster. Maybe that’s just the life we have. But it’s definitely not the life of ‘security first.’ What I do know is this: Security Third doesn’t mean we accept being left out. It means we stop using ‘security first’ as a weapon and start making the case for why involving security early makes the business faster, not slower. It means proving, project by project, conversation by conversation, that we’re there to enable the mission, not block it. It’s messy. It’s frustrating. And it’s the reality most of us are living in. The Compliance Trap One of the biggest lies we tell ourselves is that compliance equals security. Get your ISO cert, pass your SOC 2 audit, check all the NIST CSF boxes, and you’re secure, right? Not even close. Early in my career, I started on a Data Security & Compliance team. My entire day revolved around compliance minimums and passing the next audit. I learned the frameworks inside and out. I documented controls. I prepared evidence. I sat through audits. And I learned something critical: compliance is about meeting minimum standards. It’s about documenting that you’ve done the things you said you’d do. It’s valuable—I’m not arguing against compliance frameworks—but it’s not the same as being secure. I’ve seen organizations that were fully compliant and completely compromised. I’ve seen audit reports that gave clean bills of health to environments that were actively being exfiltrated. I’ve seen companies pass penetration tests in June and get breached in July. SOC 2 has become a marketing checkbox more than a security assessment. When engagements are funded by marketing departments rather than security or risk teams, you’re not measuring security posture—you’re producing sales collateral. The framework allows vendors to scope out entire trust criteria while still claiming compliance. You can skip Privacy and Confidentiality entirely and still be ‘SOC 2 compliant.’ That’s not assessment—that’s selective reporting. And before you think I’m just being cynical: security practitioners have been documenting this problem for years. One former auditor writing on Medium put it bluntly: “Passing an audit doesn’t mean you’re secure. It means you’re auditable.” When auditors arrive, he wrote, “it’s not a test of security posture — it’s a theater production. Policies are dusted off, controls are ‘shown to exist,’ and everyone plays their part.” CPA firms have warned about the “SOC 2 rubber stamp crisis”—auditors rushing through assessments to hit rock-bottom price points, sometimes with inquiry-only testing that violates AICPA standards. One CPA firm documented ultra low-cost auditors offering reports $5,000-$10,000 while hiding failed peer reviews from the public AICPA database. And the framework itself allows companies to pick and choose their scope, to ignore any of the four additional trust criteria beyond security, and still walk away with a “passing SOC 2.” (Only the Security criterion is mandatory—Availability, Processing Integrity, Confidentiality, and Privacy are all optional. What that means is that a vendor handling your customer data can elect to skip Privacy and/or Confidentiality entirely and still be “SOC 2 compliant.”) That’s not security assessment. That’s theater. PCI DSS? Too many of my years have been consumed by PCI. It’s a needed foundation—I get why it exists and what it’s trying to accomplish. But the implementation is maddening. I’ve seen QSAs pass companies that were clearly below the bare minimum. I’ve also seen QSAs who expect audit perfection and treat every gap like a catastrophic failure. The inconsistency is the problem. Here’s the reality: compliance is often about checking boxes, not about understanding and managing real risk. It’s about having a policy, not about whether that policy actually works. It’s about documenting the control, not about whether the control is effective. I think the frameworks are all needed—they help point to the bare minimum. But no one framework is right for all companies and industries. CSF is probably the closest to being universally applicable, but even that requires translation and adaptation to your specific environment. The Security Third mindset changes how you approach compliance. Instead of asking “what do we need to do to pass the audit?” you ask “what do we need to do to actually be secure?” Sometimes those align. Often they don’t. Real security comes from understanding your environment, knowing what you’re protecting and why, having visibility into what’s actually happening, and being able to respond when things go wrong. None of those things are compliance requirements. All of them are harder than compliance. Making It Work So what does this actually look like in practice? First, stop treating security as a binary. Nothing is perfectly secure. Nothing is completely insecure. Everything is cyber risk management. Your job isn’t to eliminate all risk—it’s to help the organization understand and manage information security risk in the context of what they’re trying to accomplish. But here’s the thing: you can’t do that without a baseline. Does your executive team, your board, your C-level leadership actually have a documented statement of what level of cyber risk they’re willing to accept? If not, stop what you’re doing and get them to document this. This becomes your guiding light for determining if something is risky, too risky, or “not what I would do, but it’s not horrible.” Without this, what are you even securing to? Why? What level of security is required? You have no baseline. You’re making it up as you go. Second, build security into workflows instead of bolting it on. If your security process requires people to stop what they’re doing, fill out a form, wait for approval, and then continue, you’ve already lost. They’ll find a way around it. Instead, figure out how to make the secure path the easy path. This ties back to getting that seat at the table for strategic conversations. If I’m part of the planning process, I can help build security in from the start. But that’s where I still struggle—not because of me, but because business leaders don’t see a need or reason why I should be there. They think I’m protecting them so they can do whatever they want, and I’ll just… protect them somehow. That’s not how this works. Third, focus on visibility and awareness over control and restriction. You can’t control everything—and trying to makes you blind. But you can instrument everything, monitor what matters, and respond quickly when things go wrong. This means two things: technology and people. On the technology side, invest in a solid SIEM—one you can actually afford and will actually, and can use. Log everything you can—but know why you’re logging it. Some Windows Event logging can be extremely noisy with no real value. Do you need that? Probably not. Don’t just log for logging’s sake because someone said “log everything.” Know the why. Revisit it often. Yes, there’s a cost-benefit analysis here—the more you log, the more visibility you have, but also the more expensive it gets and the more noise you have to filter through. You’ll need to make trade-offs based on what matters most to your environment and what your budget allows. Some of it you won’t be able to alert on in real-time, and that’s okay. Think of airplanes—they have flight data recorders not to prevent crashes, but to understand what happened when things go wrong. That’s what your SIEM is. When an incident happens, you need the data to explain how and why it occurred so you can shore those areas up. Without an effective, immutable SIEM, you’re left guessing. On the people side, this is where that security awareness program we talked about earlier matters. When people understand why security matters and what to look for, they become your sensors. They report the suspicious email. They flag the weird request. They ask questions before clicking links. You can’t monitor every human interaction, but you can make humans part of your monitoring system. Visibility isn’t just about tools—it’s about creating an environment where both your technology and your people are watching for the things that matter, and where you can actually see what’s happening in your environment instead of building walls and hoping nothing gets through. Fourth, invest in people—and make sure they understand the mission. Not just training, but actual development. Help your team members understand not just what the rules are, but why they exist. Give them context. Make them partners in security, not subjects of it. But here’s the critical part: your own team needs to understand what the mission actually is. All too often, we hire someone into the security team, usually transitioning from another IT function, and they interview well—but we miss that they have a preconceived notion of what security is. Lock everything down. Control the human. That’s not security—that’s a prison. They become the naysayer. ‘That’s not secure!’ Okay, to what standard? At what risk level? Compared to what baseline? How does turning this place into Fort Knox help the business accomplish anything? I’ve seen this play out multiple ways. I had someone at one organization—maybe two jobs ago—accuse me to executive management of putting the company at risk because I wasn’t taking security seriously. Why? Because I wouldn’t lock down the firewall the way he wanted. If we’d done what he advocated for, we would have crippled half the business on day one. He didn’t understand the role or the why. He just wanted to secure everything. At another organization where I started the InfoSec team, the network admin had apparently applied for my role. I got hired instead, and there was constant friction. His perspective on security was to lock everything down, control every bit of traffic, implement strict content filtering, and control the human. I was trying to enable the human. I advocated for less restrictive content filtering—just block the big bad stuff HR would freak out over and the other obvious threats. We butted heads constantly because we had fundamentally different understandings of what the job was. Your team needs to understand that security exists to enable the business, not replace it. If they don’t get that, they’ll create the exact ‘Department of No’ problem we’re trying to avoid. Fifth, be honest about limitations—especially when those limitations are outside your control. Don’t promise perfect security. Don’t tell people that following the policy makes them safe. Don’t create the illusion that someone else is responsible for their security. Make it clear that security is everyone’s job, including theirs. And be honest when you can’t protect something anymore. The last two years, my industry—like a lot of organizations—has been hell-bent on moving everything to “The Cloud.” We’re highly regulated. Our auditors and examiners really like the program we’ve built. It provides reassurance that we’re protecting what we can control. Our on-prem environment has been audited. Our examiners like what they see. But when the business keeps buying SaaS solutions, once the data leaves our environment, I lose all insight. I lose control. I lose the ability to respond. I can’t protect data we’ve entrusted to that vendor the way I can protect data in our own data center. So I’ve gotten hard on vendors. I ask tougher questions than apparently a lot of my peers do. That’s caused friction. But when the business asks why, I explain: we are secure in our data center. We have no control once the data goes to that vendor. It’s an area of contention because the vendor is supposedly the expert. They built their system. They secure the data. But during due diligence, I’ve seen vendors scope their SOC 2 assessments to ignore three of the four trust criteria. And no one cares. The business sees a passing SOC 2 and thinks “secure.” I see a vendor that scoped their assessment to avoid scrutiny and I have no way to verify what’s actually happening to our data. This is where honesty matters most. I can’t stand in front of leadership and say “we’re secure” when half our data is sitting in vendors I can’t audit, can’t monitor, and can’t control. The best I can do is say “here’s what we control, here’s what we don’t, and here’s the risk we’re accepting by going this route.” That’s not a popular message. But it’s an honest one. And in a Security Third world, honesty about what you can and can’t do is more valuable than false assurances that everything is fine. The Uncomfortable Truth Here’s what I’ve learned over the years: the organizations with the best security are rarely the ones with the most security tools or the strictest policies or the biggest teams. They’re the ones where security is ingrained in how people work. Where it’s common sense, not compliance. Where people understand the threats and think critically about risk without needing to be told. You can’t mandate that. You can’t policy your way there. You can’t achieve it by making security the top priority and overriding everything else. You get there by making security third. Not unimportant—third. Important enough to always consider, but not so important that it replaces judgment, common sense, and personal responsibility. That’s uncomfortable for a lot of security practitioners. We’ve been trained to think that if we just had more budget, more authority, more executive support, we could finally make things secure. But that’s the same fallacy as “safety first.” It creates the illusion that security is something someone else does for you. The truth is, security is something we all do together. Or we don’t do it at all. I didn’t come to this realization by reading another information security white paper or sitting through another conference keynote. I came to it by listening to a podcast that had nothing to do with InfoSec or technology. Someone mentioned “Safety Third,” and the host explained what that meant—Mike Rowe’s concept from his experiences on construction sites and dirty jobs. That moment has been rattling around in my head ever since, because in that explanation, someone articulated something I’d been struggling with for fifteen years in information security but couldn’t quite name. Rowe innovated Safety Third not to diminish safety’s importance, but because “safety first” was creating the exact opposite of what it promised. It was making people less safe by making them complacent, by convincing them that safety was someone else’s job, by replacing personal awareness with compliance theater. We’ve done the same thing with “security first.” And reflecting on the last fifteen years in this field—the breaches, the compliance theater, the reactive firefighting, the battles for strategic seats we never quite get—it’s time to admit it: we need to adopt Security Third. Not as a catchy slogan. Not as a way to downplay security’s importance. But as an honest acknowledgment of how security actually works and what actually makes organizations more secure. That conversation—the honest one about what security actually means and how it actually works—is worth having, even if it makes people uncomfortable. Especially then. Because the alternative is what we have now: organizations that say security first, act like security last, and wonder why they keep getting breached despite all their compliance checkmarks and all their tools and all their policies. Maybe it’s time to just admit what we’re actually doing and build from there. Maybe instead of fighting to be first, we should fight to be third. Because right now, in practice, we’re last. And moving from last to third—to a place where security is consistently considered, embedded in how work gets done, and honestly acknowledged as one of several critical priorities—would be a massive step forward. Security Third. Not because it doesn’t matter, but because pretending it’s first hasn’t been working. The post Security Third: Why “Security First” Makes Organizations Less Secure appeared first on Cultivating Security.
-
3
The Marquis Breach: What Happens When Your Vendor’s Security is Worse Than You Think
I was winding down my workday last week when one of my analysts posted a link in our team chat—another BleepingComputer article about a data breach. This one was different, though. Marquis Software Solutions, a vendor I’d never heard of, had just disclosed that attackers had compromised data from 74 financial institutions and over 780,000 customers. That evening, I started digging. Could this happen to us? How did this actually happen? What patterns was I seeing that I’d seen before? What I found made me realize we had a case study worth digging into—and boy oh boy, there’s a lot we can learn from it. Fair warning: this is going to be a long, in-depth analysis with several turns. But stick with me, because this incident highlights multiple systemic risks that every organization with SaaS vendors needs to understand. Marquis Software Solutions is a Plano, Texas FinTech company founded in the mid-1980s, providing marketing automation, CRM, compliance reporting, and data analytics to over 700 financial institutions. Founded in the mid-1980s, Marquis likely began with on-premises software—cloud infrastructure wasn’t exactly widespread back then. Over time, as hosting and subscription models became viable and attractive, the company appears to have evolved toward a SaaS-like offering, now marketing itself as a central data platform provider. As a SaaS provider today, client institutions share customer data—names, addresses, SSNs, account numbers, financial data—to Marquis’s centralized cloud environment. One vendor, hundreds of institutions worth of sensitive data, all aggregated in one place. According to public breach-notification filings and reporting by security news outlets, the breach exposed personal and financial data of customers across 74 banks and credit unions — reportedly over 780,000 individuals (BleepingComputer, SecurityWeek, Comparitech). In post-breach disclosures, Marquis states it has implemented enhanced security measures, including firewall patching, enabling multi-factor authentication on VPN/firewall accounts, rotating passwords and deleting unused accounts, increasing log retention, applying geo-IP filtering on remote access, and deploying endpoint detection and response (EDR) tools. (Emery Reddy, Main Attorney General, Iowa Attorney General) Reviewing the list of controls in those notices, these aren’t advanced defenses — they’re baseline security measures that should have been in place years ago. Any financial institution walking into an FFIEC exam with these gaps would be facing serious supervisory findings. But then I came across CoVantage Credit Union’s notification to the New Hampshire Attorney General, and the language they used tells the real story. The filings do not explicitly state these controls were missing before the breach. However, the specific verbs used — enabling MFA, deploying EDR, applying lockout policies, increasing log retention — indicate these were new implementations, not enhancements of mature, existing controls. If Marquis had been strengthening or expanding established safeguards, the disclosures would typically use words like enhanced, expanded, or improved. Instead, the verbs used across multiple disclosures — CoVantage’s filing, Marquis’s own statements quoted in GovInfoSecurity, and independent analysis from SOCRadar — strongly suggest these controls were not previously in place or were not consistently enforced.” Even the Iowa Attorney General filing submitted by Marquis’s own counsel uses the same first-time implementation language. It states that Marquis has ‘implemented additional security technologies and processes’ since the incident — again, not ‘enhanced’ or ‘expanded’ existing controls, but implemented them. That word choice aligns with the CoVantage AG filing (‘enabling MFA,’ ‘increasing logging retention,’ ‘applying lockout policies’), Marquis’s own notification quoted by GovInfoSecurity, and the remediation steps summarized by SOCRadar. Across all disclosures, the verbs point to the same conclusion: these were new deployments of foundational controls, not improvements to an existing security program. We’ll dive deeper into what this language reveals about their security posture later, but the verbs alone tell you these weren’t enhancements—they were first-time implementations of controls that should have existed for years. This breach demonstrates a structural problem I’ve been wrestling with for years: SaaS vendors operate as largely unregulated data aggregators while we—the financial institutions—bear full accountability for breaches that happen entirely outside our control. When one SaaS provider gets compromised, it instantly becomes a multi-state, multi-institution incident. And class action lawsuits are now targeting both the breached vendor AND the financial institutions, arguing that “they” “failed to adequately vet or oversee the vendor.” (Brown v. Marquis Software Solutions, Inc. et al (CoVantage + Marquis), Geoffrey v. Marquis Software Solutions, Inc. (CoVantage + Marquis), Erban v. Marquis Software Solutions, Inc. et al (Gesa Credit Union + Marquis)) The Regulatory Reality: You’re Accountable, Period Let me be absolutely clear: federal regulation leaves zero ambiguity about who is accountable when a vendor breach occurs. It’s us — the financial institutions. Under the GLBA Safeguards Rule (16 CFR § 314.4(f)(2)), financial institutions must select and oversee service providers and ensure they maintain appropriate safeguards. NCUA articulates the same expectation for credit unions — a principle equally applicable across the sector: “Credit unions are responsible for safeguarding member assets and ensuring sound operations irrespective of whether or not a third party is involved.” And the FFIEC’s June 2023 Interagency Guidance drives the point home for all banking organizations: “A banking organization’s use of third parties does not diminish its responsibility to meet these requirements to the same extent as if the activities were performed in-house.” Customers don’t have relationships with your vendors — they have relationships with you. When a vendor gets breached, you face the customer notifications, you face the regulators, and you may face litigation alleging “inadequate vendor oversight.” That accountability never shifts, even when the root cause lives entirely outside your environment. The Due Diligence Failure That Should Terrify Everyone Here’s what should keep every CISO and business leader awake: current vendor due diligence is clearly failing us. Think about Marquis for a moment—a 40-year-old company that likely underwent hundreds of due diligence engagements and SOC 2 reviews over the decades. Based on the AG filings we analyzed, they clearly didn’t have adequate protections in place to secure sensitive data for 780,000+ customers. If standard due diligence processes failed to identify these fundamental gaps at a mature vendor serving 700+ regulated institutions, what does that say about our ability to assess vendor risk across our entire portfolio? Strategic Questions Every Organization Must Answer This breach forces uncomfortable conversations that leadership can no longer avoid: Risk Classification: When SaaS vendors hold customer PII in their infrastructure, should they be governed by your cyber risk tolerance or your vendor risk tolerance standards? Because most organizations treat these very differently. Verification Standards: What level of evidence-based verification do you require beyond attestations and SOC 2 reports? Marquis proves that standard assurance mechanisms aren’t sufficient. Ongoing Visibility: How do you demonstrate ongoing visibility into vendor security posture between annual assessments? How do you detect deteriorating controls before they lead to breaches? Risk Thresholds: What’s your acceptable risk threshold for vendors holding customer SSNs and financial account data? More importantly, how do you articulate and defend that threshold when regulators ask? The Impossible Balance These aren’t academic questions—they require real decisions with real trade-offs: Security assurance depth versus vendor onboarding speed Evidence-based verification versus attestation acceptance Operational costs versus breach prevention investments Business agility versus risk visibility The challenge: SOC 2 reports are proving less effective as assurance mechanisms, yet requiring more rigorous verification creates friction with business units and vendors. How do you navigate this gap while meeting regulatory expectations for meaningful oversight? The Marquis breach demonstrates that our current approach isn’t working. The question is what we’re going to do about it. What We’re Going to Examine The Marquis breach isn’t just another vendor incident—it’s a case study in systemic failures that every organization with SaaS dependencies needs to understand. To be clear: this isn’t an attack on Marquis or the 74 financial institutions that used their services. They’re dealing with the same structural challenges we all face. I’m using this breach because it just happened, it’s well-documented, and it illustrates patterns I’ve been seeing across the industry for years. In the analysis that follows, we’ll dissect how this breach happened, why standard protections failed, and what it reveals about the broader risks facing financial services. This could have been any SaaS vendor, any set of financial institutions—the underlying issues are industry-wide, not company-specific. This is going to be comprehensive. We’ll examine the attack timeline, the regulatory implications, the litigation patterns emerging from vendor breaches, and the fundamental security architecture problems that make these incidents inevitable. Most importantly, we’ll analyze what this means for how you assess, contract with, and monitor SaaS vendors going forward. Analysis Sections: Incident Timeline and Breach Mechanics – The 74-day notification delay, CVE-2024-40766 exploitation, and what the timeline reveals about vendor incident response maturity and the patching problems that keep repeating. Why This Breach Matters Beyond Marquis – The structural accountability problem we all face, how standard assurance mechanisms are eroding, and why SOC 2 scoping has become a marketing exercise rather than meaningful security validation. The “Bare Minimum” Security Evidence – What post-breach remediation reveals about the actual control gaps that existed, and how vendor security evidence often obscures rather than illuminates real risk. Scale of Impact: The Multiplier Effect – How SaaS concentration risk works in practice, the visibility gaps that make detection impossible, and why the current innovation-without-accountability model is unsustainable at industrial scale. The Lock-In Problem: Why Timing Matters – The fundamental challenge of vendor risk management timing, how leverage erodes post-contract, and why the startup promise problem makes due diligence increasingly meaningless. Strategic Implications for Vendor Risk Management – Why contractual protections aren’t actually protection, the classification problems that matter for risk assessment, and the impossible position leadership faces when accountability doesn’t match control. Conclusion: The Systemic Nature of SaaS Vendor Risk – The broader implications for how we approach vendor dependency at scale, and why individual organizational solutions can’t address industry-wide structural problems. Each section builds on the previous analysis, but you can jump to specific areas based on your immediate concerns. The goal isn’t to provide easy answers—it’s to give you the data and analysis needed for informed strategic decisions about vendor risk in your environment. Let’s start with what actually happened and why it took so long for anyone to find out about it. Incident Timeline & Breach Mechanics Timeline of Failure August 14, 2025 — Breach occurs September 2025 — Marquis engages Rapid7 (GovInfoSecurity) September–October 2025 — Forensic investigation underway Late October 2025 — Scope confirmation completed (inferred from Oct 27 CU notifications) October 27, 2025 — Marquis notifies financial institutions November 2025 — Additional security controls implemented (per Iowa AG filing) November 26, 2025 — Individual consumer notifications begin December 2–4, 2025 — AG filings become public This timeline exposes a critical accountability gap that I see in vendor contracts constantly. Credit unions are required to notify NCUA within 72 hours once they reasonably believe a reportable cyber incident has occurred, and to complete formal reporting within 10 days. But that belief is entirely dependent on when the vendor chooses to tell us an incident has happened. Here’s the problem:Most vendor contracts still use vague language like “as soon as reasonably practicable,” with no definition of what counts as discovery. Vendors and their legal counsel often interpret “discovery” as the point when the forensic investigation is completed and the scope is validated — not when the compromise is first detected. Under this interpretation, Marquis could argue that the 74-day delay in notifying its 74 financial-institution clients was “reasonable,” because the investigation was “still ongoing.” But that creates an impossible situation for us:Regulators expect rapid notification once we become aware of a reportable cyber incident — yet our ability to become aware is entirely dependent on vendor discretion. If a vendor waits 74 days before disclosing unauthorized access and exfiltration of customer data, then for 74 days we cannot: meet our regulatory reporting obligations, assess customer risk, initiate protective measures, or even confirm that an incident occurred. The regulatory clock doesn’t pause for vendor investigations — and our compliance depends entirely on when a vendor decides the incident is “validated” enough to tell us. Attack Vector and Exploitation Based on public reporting, the Marquis intrusion aligns with the broader Akira ransomware campaign targeting SonicWall devices in mid-to-late 2024. According to BleepingComputer, Akira operators were exploiting a SonicWall zero-day beginning in early September 2024 to steal VPN credentials and one-time passcode (OTP) seeds, allowing them to bypass MFA entirely. Although SonicWall later released patches, many organizations—likely including Marquis—were still compromised because patching alone did not reset stolen credentials or OTP seeds. BleepingComputer reports that Akira continued signing into SonicWall VPN accounts even when MFA was enabled, strongly indicating that the attackers had extracted the underlying OTP seeds during earlier exploitation. The Patching Problem That Keeps Repeating Let me step back for a moment, because this incident highlights something I see over and over again in breach reports. All too often, I read about incidents that “could have been prevented with proper patching”—but that’s only part of the story. In this case, based on BleepingComputer’s analysis, patching alone wouldn’t have fully resolved the issue. Remember Spectre? Microsoft patched the CPU vulnerability, but did you know there was a registry key that needed to be set as well? Otherwise, it didn’t really get fixed. I know this because for two years I went back and forth with my IT counterparts: “Still shows vulnerable.” “But I installed the patch!” “Did you set the registry key?” “But I installed the patch!” “YOU NEED TO SET THE REG KEY!” Here’s the reality: patching is hard, patching is tedious, and hardening is even harder. But when it comes to vulnerabilities and patches, you need to read the patch notes—not just assume “I patched it, therefore I fixed it.” This SonicWall vulnerability required organizations to not only apply the patch but also reset all potentially compromised credentials and OTP seeds. How many organizations actually did the full remediation? Even fully patched SonicWall devices remained vulnerable if credential and seed resets didn’t occur, enabling attackers to authenticate long after the vulnerability itself was closed. This is exactly why we need our leadership teams and business counterparts to understand that we need the time to do patch management right—not just check a box that says “patched.” I especially feel for the small IT and security teams dealing with this. When you have 20 people in IT who all have day jobs, there’s constant friction around patching timelines. But use examples like Marquis to build the business case: “Here’s what happens when we don’t get remediation right.” Automate what you can, but read those patch notes and validate that fixes are actually complete. (I’ll dive deeper into practical approaches for resource-constrained teams in a future piece.) The lesson isn’t just “patch faster”—it’s “understand what complete remediation actually requires and give security teams the time to do it properly. The Ransom Payment That Solved Nothing A now-deleted filing by Community 1st Credit Union – also reported by Comparitech – revealed that “Marquis paid a ransom shortly after 08/14/25″—yet customer data was still compromised and exfiltrated. This likely explains the 2.5-month notification delay. One interpretation is that Marquis management probably believed that paying the ransom resolved the incident without requiring disclosure. It was only after engaging legal counsel and forensic investigators—who discovered the actual extent of data exfiltration and unauthorized access—that the obligation to report became unavoidable. The forensic investigation most likely revealed what attackers don’t disclose when they’re collecting ransom payments: the full scope of network compromise, lateral movement, and data theft. Paying ransom may stop the immediate encryption, but it doesn’t evict the attacker or close the vulnerabilities that enabled initial access. Here’s something that should concern every CISO: threat actors typically aim to maintain persistence. If a target paid once, they’ve demonstrated willingness to pay again, making them an attractive target for future attacks. Timeline with Forensic Engagement: August 14, 2025: Initial breach detected, ransom paid shortly after September 8, 2025: External forensic expert engaged to identify affected individuals October 27, 2025: Investigation completed, financial institutions notified (74 days post-breach) November 26, 2025: Individual customer notifications began (104 days post-breach) What This Timeline Reveals About Incident Response Maturity The 25-day gap between breach detection and engaging forensic experts tells me everything I need to know about Marquis’s incident response preparation: Strongly suggests they had no effective incident response plan. A mature IR plan triggers immediate forensic engagement upon detecting compromise. You don’t wait three and a half weeks to call in experts. This timeline strongly indicates a lack of in-house security expertise. Organizations with dedicated security teams recognize that paying ransom doesn’t eliminate breach notification obligations. Someone should have known this immediately. This delay suggests they had no forensic retainer. The delay suggests they spent weeks finding a firm, negotiating contracts, and getting investigators on-site. A $10-20K annual retainer ensures immediate response when breaches occur. For a company serving 700+ financial institutions, this should have been table stakes. Likely over-reliance on cyber insurance. Many organizations depend solely on insurance-provided forensics, but insurers prioritize assessing coverage eligibility—verifying you maintained those attested controls—not expediting response. Most cyber policies now mandate MFA, segmentation, and password rotation. Sound familiar? This demonstrates a reactive rather than proactive security posture. Marquis treated the incident as “contained ransomware” until forensic analysis revealed the full scope of data compromise—indicating they may have fundamentally misunderstood breach notification requirements despite serving 700+ regulated financial institutions. For a 40-year-old vendor in the financial services space, this level of incident response immaturity is inexcusable. WHY THIS BREACH MATTERS BEYOND MARQUIS The Marquis breach matters not because of Marquis specifically, but because it demonstrates a pattern I’ve been seeing across nearly all SaaS providers in financial services. The Structural Problem We All Face SaaS vendors aren’t regulated like we are. They operate outside FFIEC/NCUA regulatory scrutiny and are bound only by contractual terms, not mandated cybersecurity standards. Most prioritize speed-to-market and customer acquisition over cybersecurity maturity, operating with thin security programs and minimal detection capabilities. Here’s what really gets me: vendors create concentration risk at scale. When attackers compromise a single SaaS provider, they don’t hit one institution—they hit a data hub serving dozens or hundreds of institutions simultaneously. Marquis served 700+ institutions; one breach became 74+ institutional incidents instantly. And guess who bears the accountability? Not the vendor. When breaches occur, customer impact letters reference us. Members call us, not the vendor. Regulators examine us, not the vendor. We provide credit monitoring, manage notification requirements, and absorb reputational damage—even when the breach occurred entirely outside our environment. The Accountability That Never Goes Away Let me be crystal clear about something: regulators and the law leave no ambiguity about third-party accountability. It’s ours, period. Regulators and the law are clear about third-party accountability: GLBA Safeguards Rule: 16 CFR § 314.4(f)(2) requires financial institutions to oversee service providers by “requiring your service providers by contract to implement and maintain such safeguards.” Since we select and contract with vendors on behalf of customers, we’re accountable for their data protection practices. NCUA Third-Party Risk Guidance: NCUA Letters to Credit Unions 07-CU-13, 24-CU-02, and 25-01: “Credit unions are responsible for safeguarding member assets and ensuring sound operations irrespective of whether or not a third party is involved.” Hiring a vendor doesn’t transfer our accountability—it adds risk we must manage. Seven out of ten cyber incidents reported by credit unions involved third-party vendors. FFIEC Interagency Guidance on Third-Party Relationships (June 2023): “A banking organization’s use of third parties does not diminish its responsibility to meet these requirements to the same extent as if its activities were performed by the banking organization in-house.” Management is responsible for developing and implementing third-party risk management policies, procedures, and practices. Choosing a SaaS vendor doesn’t transfer the risk; regulators still expect us to oversee the vendor. Translation: Hiring a vendor doesn’t transfer our accountability—it adds risk we must manage. “They’re our vendor” is not a defense when breaches occur. What Makes This Case So Instructive Here’s what should terrify everyone: Marquis likely underwent hundreds of due diligence reviews from the 700+ institutions they serve. They almost certainly had SOC 2 certification. Financial institutions had contracts with security requirements. Yet security controls appear to have been inadequate. This raises the uncomfortable question: if standard vendor assurance mechanisms failed this spectacularly, what does that say about our entire approach to vendor risk management? The Erosion of Standard Assurance Mechanisms The Marquis breach exposes several emerging challenges in vendor risk management that extend beyond any single vendor: SOC 2 Scope Control and the Marketing Department Problem Here’s something that drives me nuts: vendors control what gets audited in SOC 2 assessments. They select specific Trust Services Criteria while excluding others, or define narrow system boundaries that conveniently exclude critical infrastructure. But here’s the real problem—SOC 2 has become a sales and marketing tool rather than a security validation mechanism. I see compliance platforms openly marketing SOC 2 as a way to “close larger customers in less time,” “shorten sales cycles,” and “turn compliance into a growth strategy.” In many organizations I’ve worked with, SOC 2 initiatives are funded by marketing and sales departments—not security or IT—because certification has become a gateway for landing deals. Think about the incentive structure here: when revenue teams control security audit scope and budget, the goal becomes passing audits efficiently to unlock sales, not comprehensively validating controls. This creates predictable outcomes that I see over and over: narrow audit scopes that exclude inconvenient systems, “SOC 2-compliant packages” designed to pass audits efficiently, and vendor resistance to expanding scope beyond bare minimums. A SOC 2 report confirms that what was audited met standards at a point in time—it doesn’t guarantee comprehensive coverage, operational effectiveness, or continuous compliance. But try explaining that to a business unit that wants to onboard the vendor yesterday. The Due Diligence Resistance Pattern When security teams attempt evidence-based verification beyond attestations, I see the same predictable pattern play out every time: Vendor: “Your requirements are more stringent than other institutions. These questions are too burdensome.” Business unit: “The vendor is complaining that IT/InfoSec/Vendor Management is creating barriers and delaying timelines.” And guess what happens? Due diligence gets constrained to email Q&A with questions submitted days in advance, allowing completely scripted responses. We can’t determine if vendors actually have dedicated security staff, can’t assess fourth-party risk from outsourced security, and can’t verify that documented practices match reality. Here’s the maddening part: when breaches occur, the first question is always “Why didn’t due diligence identify these gaps?” This question conveniently ignores all the constraints that got placed on the verification process. The Marquis case is the perfect example of this dysfunction. The vendor served 700+ financial institutions over nearly 40 years, undergoing hundreds of due diligence reviews. Yet post-breach filings reveal they lacked MFA on VPN accounts, proper password rotation, adequate logging, geo-IP filtering, and EDR deployment. If hundreds of due diligence processes failed to identify these baseline gaps, the real question should be: “Why didn’t the vendor implement fundamental controls despite serving 700+ regulated institutions?” Instead, accountability falls on security teams for not catching deficiencies that vendors deliberately obscured through prepared responses and attestations. This dynamic puts us in an impossible position: we have accountability for outcomes but no real authority to verify controls. It’s maddening, and it’s exactly how we end up with situations like Marquis. The Definitional Manipulation Problem Here’s another pattern that drives me absolutely crazy: vendors routinely redefine industry-accepted security terms to match whatever they already have built, rather than actual industry standards. “SSO” suddenly means session persistence within their application, not federated identity using SAML or OAuth. “MFA” becomes password plus security questions, not true multi-factor authentication per NIST SP 800-63B standards. I’ve seen vendors claim they have “encryption at rest” when they mean database password protection, or “network segmentation” when they mean VLANs with no access controls. The problem is, without evidence-based verification before contracts are signed, you discover these definitional games during implementation—when your leverage to demand corrections has completely evaporated. By then, you’re stuck explaining to leadership why the “SSO integration” they thought they were paying for doesn’t actually work with your identity provider, or why their “MFA” doesn’t meet your security standards. I’ve been in too many post-implementation meetings where vendors suddenly clarify what they “actually meant” by the terms they used in sales presentations. It’s infuriating, and it’s completely predictable if you know what to look for. The “Trust the Expert” Defense Oh, this one really gets under my skin. When concerns are raised—whether from security, IT, compliance, or risk management—objections are inevitably met with: “The vendor is the expert in their domain. They’ve been doing this for years. They have more resources and expertise than we do. We should trust their judgment.” This logic fundamentally misunderstands accountability. Look, if we outsource a function, we still must maintain sufficient expertise to properly govern and manage that relationship. We can outsource execution, but we absolutely cannot outsource understanding the design decisions, validating how it’s implemented, and verifying it operates correctly. Here’s a scenario I see all the time: we sign contracts with two different vendors and direct them to integrate their solutions. To properly govern this, we need to understand why that integration approach was chosen, how it’s being implemented, and whether the result actually meets our requirements and security standards. Without that understanding across the full lifecycle—from design through implementation to ongoing operation—we can’t effectively govern the relationship or manage the risks we’re still accountable for. This requires maintaining expertise sufficient to ask tough questions and verify vendor claims. Without that capability, we can’t determine whether controls actually exist, whether we’re being oversold capabilities, or when vendors are failing to protect our data. Deferring entirely to vendor expertise doesn’t transfer risk—it creates blindness to risk. And guess who still gets blamed when things go wrong? Not the vendor we “trusted.” The Marquis Reality Check Let’s talk about what should really concern everyone. Marquis has operated since 1986–1987—nearly four decades—and serves more than 700 financial institutions. Their 2021 acquisition by Rockbridge Growth Equity indicates substantial institutional backing and an expectation of mature operational practices. On paper, organizations with this longevity, scale, and customer base should have well-developed security programs aligned to industry frameworks. But as we explored earlier, post-breach filings reveal that Marquis implemented foundational controls like MFA, EDR, proper logging retention, and geo-IP filtering after the incident. The language used—ensuring, deploying, applying—strongly suggests these were first-time implementations of baseline security measures. So here’s the question that should keep every CISO awake: If a 40-year-old provider serving 700+ regulated institutions didn’t appear to have these basic controls in place, what does that say about the effectiveness of: Standard vendor due diligence questionnaires, which routinely ask about MFA, least privilege, and log retention? SOC 2 examinations, which specifically evaluate access control, monitoring, and change management? Contractual security requirements, which almost always mandate “appropriate administrative, technical, and physical safeguards”? The industry assumption that vendor longevity and market presence equate to security maturity? These questions aren’t about attacking Marquis—they’re about recognizing a systemic failure in how we assess vendor risk. The Systemic Challenge This pattern repeats across the vendor landscape—large established providers, mid-sized companies, and new startups alike. This problem isn’t unique to any single vendor type or maturity level. Current due diligence approaches are proving insufficient to identify these gaps, yet those of us responsible for vendor oversight face significant constraints that I see every day: IT, InfoSec, and vendor management teams operate with limited resources while managing expanding vendor portfolios Vendors resist rigorous scrutiny, characterizing thorough security validation as “burdensome” or “more stringent than other customers require” Business units pressure for faster onboarding, viewing security diligence as friction rather than protection Vendors themselves aim for minimum viable compliance rather than comprehensive security maturity Here’s the paradox that drives me crazy: The same assurance mechanisms that failed to prevent this breach are the mechanisms we’re forced to rely on. When we attempt deeper verification, we encounter resistance from both vendors (who view it as burdensome) and business units (who view it as delaying operational objectives). Yet regulatory accountability remains entirely ours, and customer impact is entirely ours when breaches occur. We bear full accountability for vendor failures while having limited ability to verify the security posture of the vendors we’re forced to trust. We face a system where attestations and questionnaires are accepted as sufficient evidence—despite repeated demonstrations that they are not—because the alternatives face resistance from all sides. When customer data is processed in a SaaS environment, our security capabilities become significantly limited. We lose direct visibility, can’t perform our own event detection, and can’t enforce or validate the maturity of vendor controls or logging. However, during a breach, regulators, customers, and legal actions still hold us accountable. We can transfer operational tasks to vendors, but accountability never leaves our desk. This creates an impossible position: How do we fulfill our accountability for vendor security when we’re constrained from verifying vendor security beyond attestations that have repeatedly proven insufficient? THE “BARE MINIMUM” SECURITY EVIDENCE Post-Breach Remediation Reveals Control Gaps According to data breach notifications filed with state Attorney General offices, Marquis implemented the following controls after the breach: Updating and patching firewall devices Rotating passwords for local accounts Deleting unused accounts Enabling multifactor authentication for all firewall and VPN accounts Increasing logging retention for firewall devices Applying stricter account lockout policies and geo-IP filtering Deploying endpoint detection and response (EDR) tools Source: Data breach notifications filed with Maine, New Hampshire, Iowa, Texas, and Massachusetts Attorney General offices, as documented in CoVantage Credit Union’s November 26, 2025 filing to the New Hampshire Attorney General and reported in American Banker and BleepingComputer. What the Language Reveals The AG filings do not explicitly state the controls were absent before the breach. However, the specific language used—enabling MFA, deploying EDR, applying lockout policies—strongly suggests these were new implementations rather than enhancements to existing controls. If Marquis had been strengthening or expanding existing measures, the filing would have used language like “enhanced,” “strengthened,” or “expanded.” Instead, the verbs suggest these controls were not previously in place or not consistently enforced. The specific areas addressed in post-breach remediation indicate gaps in: Multi-factor authentication coverage: “Enabling” MFA for firewall and VPN accounts suggests it was not previously enabled Password and account lifecycle management: Rotating passwords and deleting unused accounts were identified as immediate priorities Network access controls: “Applying” geo-IP filtering and lockout policies indicates these protections were not active Endpoint visibility: “Deploying” EDR tools suggests Marquis lacked endpoint detection capabilities Logging and monitoring: Increasing retention indicates prior logging was insufficient for forensic investigation These Controls Represent Long-Standing Industry Baselines The controls identified in AG filings are not advanced or emerging security practices—they represent foundational expectations established and widely adopted for years: CIS Critical Security Controls: MFA requirements for remote access and administrative accounts were strengthened in CIS Controls Version 7 (2016) and expanded further in Version 8 (2021) NIST Guidance: NIST began promoting MFA adoption as standard practice in 2016, with draft SP 800-63-3 recommending MFA for all assurance levels CISA Cybersecurity Performance Goals: CISA’s CPGs require MFA on all remotely accessible accounts, minimum 12-character passwords, and logging of all authentication attempts FFIEC IT Examination Handbook: Multi-factor authentication, password management, logging retention, and endpoint security have been core examination topics for financial institutions for over a decade These controls have been industry-standard expectations for 5-10+ years. Multi-factor authentication for remote access, password rotation, logging retention, endpoint detection, and account lifecycle management are routinely validated during financial institution examinations and are expected at even small community banks and credit unions. The Broader Implication Based on the language used in the AG filings, a vendor serving 700+ regulated financial institutions for nearly 40 years appears to have lacked baseline protections that would be unacceptable in even the smallest credit union’s environment. Any one of these controls could potentially have detected or prevented the breach. This pattern—where established vendors serving highly regulated industries lack foundational security controls that would be required of the institutions they serve—illustrates the structural challenge financial institutions face in third-party risk management. SaaS vendors often prioritize speed-to-market and customer acquisition, and unlike the financial institutions they serve, they operate outside regulatory examination frameworks that would validate baseline security maturity. The result: vendors may accumulate sensitive data from dozens of regulated institutions while maintaining security programs that would not meet the minimum standards expected of those same institutions. SCALE OF IMPACT: THE MULTIPLIER EFFECT The SaaS Reality: Innovation Without Accountability The financial services industry is undergoing significant transformation. SaaS adoption is accelerating, FinTech startups are proliferating, and “cloud-first” has become the default strategy. The likelihood of finding modern, feature-rich solutions that can be deployed on-premise is diminishing rapidly. This shift is not hypothetical—it’s the operational reality across the industry. Over the past 8-9 years, this trend has intensified: core functions, customer-facing services, compliance tools, and data analytics are migrating to vendors operating in “the cloud.” This creates an important question the industry has not fully confronted: Have we thought through the implications of this dependency? Can we trust these vendors? Should we trust them implicitly? Many SaaS vendors—particularly newer FinTech entrants—prioritize innovation and speed-to-market. Their expertise lies in product development, user experience, and rapid iteration. But expertise in technology does not equal expertise in security. Many of these companies are funded with the explicit goal of building to an acquisition target, not building for long-term operational resilience. They’ve discovered an industry that implicitly trusts them to “do the right thing” with sensitive customer data—but that trust is often misplaced. The industry cannot afford to accept this imbalance passively. Yes, there are well-resourced companies that invest heavily in security infrastructure and operational maturity. But the barrier to entry in SaaS is low, and regulatory scrutiny of vendors remains minimal compared to the institutions they serve. The result: financial institutions bear full accountability for vendor failures while having limited ability to verify vendor security practices. Concentration Risk in Practice When attackers compromised Marquis, they didn’t breach one institution—they breached a data hub acting as shared service provider for hundreds of organizations. Over 780,000 individuals were impacted across 74 banks and credit unions, demonstrating how vendor consolidation transforms individual security failures into systemic operational risks. But Marquis isn’t even the worst example we’ve seen. In May 2020, Blackbaud—a cloud/SaaS provider for nonprofits, universities, healthcare organizations, and more—was hit by ransomware. Attackers exfiltrated data before Blackbaud paid the ransom and claimed to have blocked further unauthorized access. The breach impacted thousands of organizations worldwide that used Blackbaud’s services, with many customers subsequently having to notify donors and users of data exposure. The SEC later charged Blackbaud for failing to reasonably safeguard personal information and not adequately disclosing the breach to investors. One vendor’s security failure became thousands of organizations’ crisis management problem. Marquis served over 700 institutions, meaning compromise of one SaaS provider instantly becomes a multi-state, multi-institution incident, a class-action magnet, and a systemic operational risk. A breach of a SaaS provider with broader reach—such as core banking platforms, CRM systems, loan origination systems, or digital banking providers—would have exponentially greater impact. The Visibility Gap Financial institutions often have limited insight into vendor security posture. Banks and credit unions lack direct visibility into a vendor’s internal security controls, patch management status, VPN usage, or incident detection capabilities. Once customer data leaves our environment for a SaaS service, we lose visibility, event detection capability, the ability to enforce logging fidelity, and cannot validate internal control maturity. This creates an unbalanced risk model: the vendor controls the security environment, but the financial institution bears the consequences of failure. The Path Forward: Accountability Must Match Dependency If the financial services industry is moving irreversibly toward SaaS and cloud-based solutions—and all evidence suggests it is—then the industry must collectively raise the bar for vendor security standards. We cannot allow the convenience of modern technology to override our fiduciary responsibility to protect customer data. This requires: Rejecting implicit trust: Vendors may be innovative, but innovation does not equal security maturity Demanding evidence, not attestations: SOC 2 badges and questionnaire responses are insufficient Holding vendors accountable contractually: Security failures must have meaningful consequences Elevating vendor governance: SaaS vendors holding customer PII should be governed with the same rigor as internal systems Otherwise, breaches like Marquis will continue, and financial institutions will continue bearing the reputational, regulatory, and financial consequences while vendors face minimal accountability beyond potential contract disputes. Historical Context: We’ve Seen This Before — The Dot-Com Boom and the Return of “Security Last” To understand why today’s SaaS ecosystem carries so much unmitigated risk, let me take you back to something we should have learned from already—the dot-com boom. In the late 1990s, during the dot-com boom, companies raced to capture market share, deploy new features, and get acquired before their competitors. Technical debt was accepted. Security debt was ignored. The prevailing strategy was simple: grow fast, exit fast, and let someone else figure out the long-term risks. Very few companies built with operational resilience in mind because resilience didn’t help valuations—momentum did. Investors rewarded speed, not security. Regulations were minimal. And consumers lacked the awareness to demand better. Innovation flourished, but it did so on a fragile, insecure foundation that collapsed as quickly as valuations did. Fast-forward twenty-five years, and the same incentive structure has re-emerged—this time within SaaS and FinTech. SaaS Is the Dot-Com Boom at Industrial Scale The parallels are hard to ignore: Rapid market capture over disciplined security engineering VC pressure to iterate quickly and aim for acquisition Minimal regulatory oversight of vendors, even those handling sensitive data Widespread, unexamined trust by financial institutions The only real difference is that today’s technologies operate at a magnitude the dot-com era never imagined. A single SaaS breach no longer affects thousands; it affects millions. Vendors no longer host simple static websites; they host core operational functions for regulated institutions. The surface area and blast radius have expanded exponentially, while the underlying culture of “security is someone else’s problem” remains largely intact. The financial institutions, and specifically the credit union’s role in this dynamic warrants examination. Credit unions have increasingly invested in FinTech innovation through CUSO venture structures and strategic partnerships, funding startups designed for rapid growth and acquisition. While this drives innovation, it also creates a paradox: financial institutions fund vendors optimized for speed and market capture, then bear full regulatory accountability when those same vendors experience security failures. The industry is simultaneously investing in and being harmed by the same incentive structures that prioritize momentum over security maturity. Why This History Matters This moment is not an anomaly—it’s a continuation of a structural pattern that appears whenever innovation significantly outruns accountability. SaaS, like the dot-com predecessors before it, is expanding faster than governance frameworks can adapt. The credit union industry faces a unique paradox: through investments and strategic partnerships, credit unions are funding the creation of vendors built with the same incentive structures they will later be held accountable for managing. The industry is both investor and victim, simultaneously enabling and suffering from vendor security failures. If the industry does not internalize the lessons of past failures, it will repeat them—only at a scale that transforms isolated vendor weaknesses into sector-wide operational risks. Here’s how I’d revise this section with more practitioner voice: THE LOCK-IN PROBLEM: WHY TIMING MATTERS The Vendor Risk Management Timing Challenge Here’s a pattern I see over and over again: organizations involve security teams too late in the vendor selection process. By the time security gets to review vendor controls, contracts are nearly signed, business units have committed to timelines, and relationships have been established. This creates a predictable disaster: sales teams build rapport and trust with business units, demonstrating features and capabilities that address operational needs. Business units fall in love with the solution. Only then does security get involved and begin asking basic vetting questions—questions that should have been asked on day one. When security identifies gaps or concerns, we’re now disrupting an established relationship rather than preventing a problematic one from forming. Discovering inadequate security during implementation leaves only bad options: accept the risk, implement compensating controls we have to fund and maintain, or attempt costly contract renegotiation. Meanwhile, security teams get blamed for “causing problems,” “disrupting vendor relationships,” and “making waves”—despite identifying risks we’re contractually obligated to manage. The animosity isn’t because security asked hard questions; it’s because we asked them too late, after emotional and operational investment had already occurred. The earlier security gets involved, the easier it is to walk away. Initial vetting questions don’t require deep technical expertise—they require asking basic questions before anyone falls in love with the solution. When security can flag concerns before relationships solidify, the conversation becomes “let’s find a better vendor” instead of “you’re blocking our project.” Pre-Contract vs. Post-Contract Leverage Before signing, we hold all the power: Vendors want our business, we can require security controls as contract conditions, we can walk away, and security gaps are negotiable deal points. After signing, leverage evaporates: Vendors already have our revenue, security improvements require their investment with no return, switching costs create dependency, and business units resist changing established workflows. The Contract Erosion Problem Here’s something that really frustrates me: many vendor relationships predate the SaaS security awareness that emerged over the past 8 years. I regularly review legacy vendor contracts that are 6+ years old with no security requirements or breach notification language whatsoever. In many cases, relationships began when solutions were on-premises, then migrated to cloud/SaaS delivery models—but contract language was never updated to reflect the massive change in data custody and risk profile. Without contractual security obligations, vendors have no binding duty to notify within specific timeframes, there’s ambiguity about incident response responsibilities, security control requirements are undefined or outdated, and we have no contractual recourse for security failures. When vendors lack explicit notification requirements, they default to their own timelines—which prioritize legal review and investigation completion over our notification obligations. This may explain why breach notification delays of 74+ days occur even when regulators expect 72-hour internal notification and 10-day reporting. The Startup Promise Problem Here’s where this gets particularly frustrating with startup vendors. Startups often make promises their sales teams can’t back with actual security maturity. Once contracts are signed, fixing those gaps suddenly isn’t a priority anymore. Funding cycles, technical debt, and M&A pressure mean “security later” becomes “security never.” I’ve watched this pattern play out repeatedly: enthusiastic sales presentations about security features that turn out to be roadmap items, not reality. By the time you discover the gap during implementation, they’re focused on the next funding round or acquisition target—not on delivering the security controls they promised. The Critical Takeaway Initial due diligence represents our highest-leverage moment. Security must be involved before vendor selection, requirements must be contract conditions, and “we’ll implement that next quarter” must trigger dated contractual milestones or disqualification. CISA gets this. Their Secure by Demand guidance released in August 2024 was brilliant—essentially a call to all regulated industries, small and large, to unify and demand these vendors do the right thing. Build security in from the start, extend visibility so we can actually see what’s happening in their environments, demand better or walk away from them. I love the message, I love the idea, and I’ll stand behind it completely. But here’s the reality: since it got released, I don’t see much tangible change. Change takes time, I get that, but until a vast majority of us start asking those demanding questions—and we’re actually supported by our business units when we do—vendors won’t feel enough pressure to change their approach. The last thing I want is another law trying to fix this. Just look at HIPAA—that didn’t solve much, caused a lot more administrative burden, and we’re still seeing healthcare-related breaches regularly. We need market pressure, not regulatory mandates that create compliance theater while missing the actual security problems. But due diligence isn’t one-time: contract renewals and service model changes must trigger security reassessment and contract updates. Legacy vendor relationships with outdated contract language provide no protection when delivery models have fundamentally changed. Without pre-contract leverage and ongoing contract maintenance, we accept whatever security posture vendors choose—which may be inadequate even for vendors serving hundreds of regulated institutions for decades. Here’s how I’d revise this section with more practitioner voice: STRATEGIC IMPLICATIONS FOR VENDOR RISK MANAGEMENT Why “Contractual Protections” Aren’t Actually Protection Here’s something that drives me crazy: SaaS vendors typically operate outside regulatory scrutiny and are only bound by contractual terms—not by mandated cybersecurity standards. Contractual clauses provide legal recourse after a breach, but they do absolutely nothing to prevent the breach itself. Insurance coverage and indemnification don’t restore customer trust or prevent regulatory scrutiny. The Core Problem We Can’t Solve Vendors create the risk through inadequate security, but we absorb the regulatory and reputational impact. This is vendor risk management’s fundamental paradox: accountability can’t be outsourced, risk can’t be transferred, yet we remain fully responsible for outcomes we can’t directly control. The Classification Problem That Matters When SaaS vendors host customer data in their infrastructure, is this a vendor risk or a cyber risk? The distinction matters more than most people realize for governance, and it reveals a real and known governance flaw in how many institutions operate. Some institutions categorize SaaS as “vendor risk,” which often has: Lower board scrutiny Higher acceptable risk thresholds Weaker metrics and reporting Less rigorous evidence requirements Slower escalation paths Meanwhile, cyber risk tolerance is usually much stricter because regulators expect it. We have examination standards to meet, compliance requirements to satisfy, and customer data protection obligations that don’t disappear just because we’re using a vendor. Here’s what’s actually happening: When we engage a SaaS vendor, we’re choosing to outsource infrastructure management, not accountability. The vendor’s data center becomes an extension of our infrastructure—we’ve simply chosen to have someone else manage it. But we retain full accountability: we chose to entrust customer PII to this environment, we chose to accept their security decisions, and we bear all consequences when their security fails. This isn’t my opinion—it’s the only correct interpretation from a supervisory standpoint. The data doesn’t care where it lives, and the regulatory and operational risks are identical whether it’s in our data center or theirs. The Strategic Question Leadership Must Answer Should SaaS vendors handling customer PII be governed by our cyber risk tolerance standards rather than vendor risk tolerance standards? If we wouldn’t accept inadequate MFA, logging, or patching in our own environment, why would we accept it in a vendor’s environment that holds our customer data? The answer should be obvious: SaaS vendors with customer PII must be held to the stricter standard. Anything else is a governance failure that regulators will eventually identify and address. The Impossible Position Here’s the mismatch that keeps me up at night: We have accountability for security outcomes but no authority over security management when it comes to SaaS solutions. We can’t mandate that the vendor patches their systems, can’t enforce MFA on their accounts, can’t control their logging standards, or direct their incident response. Yet when breaches occur in the vendor’s environment, the impact is identical to breaches in our own: Same sensitive data compromised (SSN, account numbers, financial data) Same customer impact and notification requirements Same regulatory consequences and examination findings Same reputational damage and trust erosion Same legal liability exposure The critical difference: In our data center we control remediation; in the vendor’s environment we depend on their priorities, their timelines, and their investment decisions. The SaaS Governance Challenge Accountability without authority—that’s the position we’re in. When customer data is processed in a SaaS environment, our security capabilities become significantly limited. We lose direct visibility, can’t perform our own event detection, and can’t enforce or validate the maturity of vendor controls or logging. However, during a breach, regulators, customers, and legal actions still hold us accountable. We can transfer operational tasks to vendors, but accountability never leaves our desk. The Strategic Question Leadership Must Answer Should SaaS vendors handling customer PII be governed by our cyber risk tolerance standards rather than vendor risk tolerance standards? If we wouldn’t accept inadequate MFA, logging, or patching in our own environment, why would we accept it in a vendor’s environment that holds our customer data? We’ve outsourced infrastructure—but regulators, customers, and the law hold us accountable as if we still control it. CONCLUSION: THE SYSTEMIC NATURE OF SAAS VENDOR RISK SaaS vendors are now one of the highest systemic risks we face as financial institutions. The Marquis breach provides the model we should expect to see repeated: Vendor with inadequate security controls gets breached Weak internal detection allows prolonged attacker access Widespread impact across multiple financial institutions Customer data directly compromised Our reputations get damaged Lawsuits claiming we failed to govern the vendor But this assumes the breach actually gets disclosed. The Marquis timeline itself suggests a more concerning risk: a 2.5-month delay between breach detection and client notification, with that now-deleted filing indicating ransom payment shortly after the incident. This delay pattern suggests Marquis may have initially believed the incident could be contained without disclosure—only reporting after forensic investigators and legal counsel determined notification was unavoidable. A mature, 40-year-old company serving 700+ institutions, with access to legal counsel and forensic expertise, still delayed notification for 74 days while regulators expected 72-hour internal reporting. Now think about the SaaS startup with seven full-time employees, no dedicated security team, no forensic retainer, and no experienced legal counsel on speed dial. When they detect suspicious activity, who makes the call about whether it’s “reportable”? Who determines the scope? Who tells them they’re legally obligated to notify client institutions? The barrier to “deciding” an incident doesn’t require disclosure is far lower when there’s no mature incident response program, no legal team pushing for transparency, and significant financial incentive to avoid reporting a breach that could destroy the company. Here’s the reality that should concern every one of us: we lose visibility the moment data leaves our environment. We depend entirely on vendor detection capabilities, vendor judgment about disclosure obligations, vendor access to competent legal counsel, and vendor willingness to report incidents that could trigger contract penalties, damage reputation, or—in the case of startups—end the business entirely. The more concerning scenario is one we may never know occurred: a vendor concludes no disclosure obligation exists, and customer data is compromised without notification ever reaching us. We don’t control the security maturity of SaaS platforms, but we’re fully accountable for the data we send to them. We can’t verify whether breaches occur, yet we bear full regulatory and reputational consequences when they do. This imbalance defines modern vendor risk. Another major SaaS breach will occur—the question isn’t if, it’s when. The financial services industry is moving irreversibly toward cloud-based solutions and SaaS dependencies. We can’t reverse this trend, but we can choose how we govern it. The Marquis breach demonstrates that standard assurance mechanisms—SOC 2 certifications, security questionnaires, contractual indemnification—are insufficient to prevent significant control gaps at vendors serving hundreds of regulated institutions. The cost of inadequate vendor governance gets measured in breach response, regulatory enforcement, litigation defense, credit monitoring expenses, and permanent reputational damage. We can’t outsource accountability—only infrastructure. The strategic questions this breach raises—about risk tolerance, due diligence rigor, contractual requirements, and the balance between vendor selection speed and security visibility—don’t have simple answers. But they’re questions leadership must address, because the alternative is accepting that breaches like Marquis will continue, and that we’ll continue bearing the full consequences while vendors face minimal accountability beyond potential contract disputes. The post The Marquis Breach: What Happens When Your Vendor’s Security is Worse Than You Think appeared first on Cultivating Security.
-
2
Willful Ignorance as a Security Vulnerability
Saturday evening. Long day of side projects and farm work. The corporate work week was done, but I’d been grinding through accounting, blog writing, development work—all the side-business stuff that fills weekend hours. I was contemplating just getting out of the house for a while. There was snow in the forecast, but maybe I could run out, grab a pizza at the local joint, have a beer, watch whatever game they had on TV, and maybe even interact with other humans for a bit. As I was trying to disconnect from my work brain, I was feeding the dogs and bantering with an AI the way I sometimes do—just to break up the day, have fun, and see where the conversation leads. I think I was talking to it about where I should go eat, the snow in the forecast, and the fact that it wasn’t supposed to start until 9 PM. It was only 5 PM—I had plenty of time to get out, relax, and unwind while I waited for the dogs to come back inside. I tossed out a harmless joke: You know how weather forecasters are. The only job in the world you can be wrong 100% of the time and still keep your job. Just weekend humor. But the AI didn’t let it slide. It pushed back with something like: The weatherman joke is classic, but honestly, they get more grief than they deserve. Still, you’re not wrong—it’s one of the few jobs where being publicly wrong is just part of the gig. Imagine if your farm accounting had that kind of tolerance for error. That made me pause. So I replied: Yeah, and actually the forecast does help me plan… but in my day job running an InfoSec program, if I’m wrong even once? What do you think happens? And right there—in the middle of trying not to think about work—I accidentally tripped into the blog post I didn’t know I was looking for. Willful Ignorance—from my perspective and the 20+ years I’ve been in IT and Security, I’m not sure if anyone has really directly talked about this, but I’ve observed it through behaviors and management practices throughout my career. Willful ignorance is when an organization chooses to avoid information that would force them to take responsibility for risk. It’s not about lacking resources or time—it’s about the deliberate decision to stay uninformed because knowing would create uncomfortable obligations. For those early in their careers, it helps to understand the organizational forces that drive this behavior. Leadership faces competing priorities where security often loses to immediate business needs. Budget constraints create tension when fixing problems costs money upfront. There’s genuine fear of accountability—once you officially know about a risk, you own it. Add in office politics where being the messenger of bad news can hurt your career, and the cognitive discomfort of confronting how vulnerable you really are. Understanding these dynamics helps explain why otherwise smart people make seemingly irrational decisions to avoid security information. The Weatherman Paradox Think about how we treat weather forecasts. Meteorologists are wrong regularly. We joke about it. We expect it. We laugh when they call for sunshine and we get drenched anyway. But we still check the forecast every single day. Even imperfect information helps us plan: We decide what to wear. We adjust outdoor plans. We carry umbrellas “just in case.” We make informed choices even when the information isn’t perfect. We recognize something important: Knowing something—even if it’s uncertain—is more valuable than knowing nothing. That’s the paradox:We accept uncertainty in weather forecasting because we know it still improves outcomes. People choose to know, even when the knowledge might be wrong. This analogy matters because both fields operate in uncertainty—but only one punishes you for being wrong once. The InfoSec Reality: No Room for Error Now flip the analogy to cybersecurity. In InfoSec, being wrong once can be catastrophic.One missed vulnerability? Ransomware.One overlooked misconfiguration? Data theft.One misinterpreted alert? Attackers get weeks of free access. The 2024 Verizon Data Breach Investigations Report confirms what we see in the field—exploited vulnerabilities now account for 14% of breaches, nearly triple the rate from 2022.And the math is brutal: Defenders must be right nearly 100% of the time. Attackers only need to succeed once. Industry data shows this repeatedly. The Mandiant M-Trends Report documents how initial footholds frequently come from a single misconfiguration or compromised account, often leading to weeks or months of attacker dwell time.The consequences aren’t “oops, I got caught in the rain.” They’re: Operations shut down Millions lost to recovery Regulatory fines Lawsuits that drag on for years Reputational damage that haunts an organization for a decade or longer Despite these stakes, I see the same pattern across industry after industry: People choosing not to know. The Dangerous Choice: Willful Ignorance This isn’t passive ignorance—it’s active. It shows up in statements like: “Don’t tell me about vulnerabilities I can’t fix right now.” “We’re too small to be targeted.” “We haven’t been breached yet, so we’re fine.” “Let’s skip the penetration test this year; things are busy.” “We don’t need logging on that system… nothing sensitive is on it.” Early-career InfoSec professionals encounter this constantly and often blame themselves. If you’ve experienced this, you’re not alone. It’s real. It’s widespread. And it’s dangerous. Willful ignorance manifests when: Leadership avoids vulnerability reports Business units buy tools without security review because they don’t want to be told ‘no’ IT teams delay assessments Budget committees deprioritize security every cycle Organizations decline to implement basic controls like MFA or logging This isn’t theoretical—Microsoft’s research shows MFA blocks over 99% of credential attacks, yet adoption remains inconsistent across industries: Why People Choose Ignorance The psychology of willful ignorance is simple, but it’s backed by research. Behavioral psychology research on cognitive dissonance, motivated reasoning, and willful blindness shows that people often avoid acknowledging security risks when the truth feels inconvenient, embarrassing, or politically costly—making willful ignorance itself a major vulnerability. Knowing creates responsibility. There’s also a practical reason leadership avoids “knowing” officially. Once you acknowledge a known risk in documentation, meetings, or formal communications, regulatory frameworks and legal liability often increase significantly. Courts and regulators judge organizations more harshly when they can prove you knew about a problem and chose not to act. Knowing creates responsibility. If you know your email server is unpatched and exploitable, you now have three choices: Fix it Accept the risk Admit you’re ignoring the risk Only one of these is comfortable. So people convince themselves that: “If it’s not documented, it’s not a problem.” “If we don’t run the scan, we don’t have to explain it.” “If we don’t know, we can’t be held accountable.” But here’s the truth: Ignorance doesn’t reduce risk. It only reduces accountability—until the breach. The Fatal Flaw Here’s what happens when organizations choose not to know and then get breached: Attackers stay hidden longer because no one is monitoring. Damage spreads further because nothing triggers containment. The IBM Cost of a Data Breach Report puts hard numbers on this reality: organizations take an average of 204 days to detect a breach and another 73 days to contain it. When the attack involves stolen credentials—the most common attack vector—that timeline stretches to 292 days. IR becomes chaotic, expensive, and reactive. Recovery takes longer, impacting every business function. Regulators and courts judge organizations more harshly when they can prove you knew about a problem and chose not to act—negligence is worse than error. Organizations that fall into this trap often experience the same painful outcomes—longer breaches, slower detection, and far more damage than necessary—all because the warning signs were ignored. And here’s the kicker: “We didn’t know” is not a defense.Not legally. Not operationally. Not ethically. The Gap Gets Wider Attackers study, practice, share techniques, and evolve. Every. Single. Day. Organizations that choose ignorance force their defenders to stand still. Mandiant’s latest research shows attackers now need only 11 days median dwell time to accomplish their objectives, while defenders using traditional approaches can take months to even detect the intrusion.The gap widens: Security teams miss new attack vectors because leadership won’t fund threat intelligence They fail to spot early indicators because monitoring tools are “too expensive” They’re forced to operate with knowledge gaps because assessments get declined And you can’t defend against threats when leadership refuses to let you study them. It’s the organizational equivalent of refusing to let the meteorologist check the weather while storms keep getting more unpredictable. The Weather Forecast Lesson (Revisited) Weather forecasts aren’t perfect. They never have been. They never will be. But we use them anyway because they increase preparedness. Security information works exactly the same way: Reports about new attacks aren’t perfect Vulnerability scans miss things Pen tests can’t replicate every scenario Security tools occasionally throw false positives But imperfect information still: Narrows risk Guides decision-making Improves detection Builds resilience The right question is never: “Is this information 100% accurate?” The right question is: “Am I better off knowing or not knowing?” In security, the answer is always knowing. What This Means for You Security practitioners come from all shapes and sizes—companies that are well-funded, excellent management that listens, early-career professionals, or those changing fields. Regardless of your situation, here’s what matters: This pattern is universal. Willful ignorance shows up everywhere—startups, Fortune 500s, government agencies. The psychology is identical: people avoid knowing because knowing forces action, accountability, and discomfort. Imperfect information beats willful ignorance every time. You can adopt weather-style thinking in security: directionally correct trumps perfectly accurate, probability beats certainty, and preparedness matters more than prediction. Learn to identify ignorance as a risk itself. Not knowing isn’t neutral—it actively increases dwell time, blast radius, response costs, and business impact. Recognizing this dynamic is half the battle. The career skill is diplomatic challenge. Junior analysts who learn to identify and tactfully address willful ignorance—without alienating stakeholders—develop an incredibly valuable capability. You’re not just pointing out problems; you’re helping organizations make better risk decisions. You now have vocabulary for the frustration. Being able to name what you’re seeing transforms helpless frustration into strategic action. When you can articulate why someone is choosing not to know, you can address the real barriers to better security. The Real Choice You can’t control whether you’re targeted. Modern attackers automate their targeting. It’s algorithmic, not personal. The Microsoft Digital Defense Report highlights a 32% surge in identity-based attacks, driven by automated credential theft and infostealer malware operating at scale. But you can control: Whether you’re prepared Whether you have visibility Whether you understand your environment Whether you can recover when—not if—a breach happens Choosing ignorance because knowledge is uncomfortable doesn’t change the threat landscape. It only guarantees you’ll be unprepared when the inevitable happens. And for early-career InfoSec pros: Learning to identify, communicate, and challenge willful ignorance is a core skill. Full Circle That Saturday evening forecast? It called for 2 inches of snow starting at 9 PM.When I woke up the next morning, there was… barely a dusting.The forecast was wrong. But it still helped me plan my evening: I dressed for the possibility of snow I made decisions with the best information available I accepted that uncertainty is part of the equation That’s how security awareness should work. Not perfection.Not absolute certainty.Just actionable clarity. And here’s where the analogy really matters for new InfoSec professionals: We’re not trying to build Fort Knox.Unlimited security budgets don’t exist.Perfect security doesn’t exist.And trying to lock everything down to the extreme just forces people to bypass controls. We only need to be: More prepared than before, and More prepared than the organization next to us, and Ready to act on imperfect information This is the art of security: Conveying risk in a way people can act on Helping leadership understand consequences without paralyzing the business Turning imperfect data into practical action Making informed decisions under uncertainty Meteorologists don’t just say “weather is coming.” They give you: Probabilities Timing Severity Expected impact They make imperfect information useful. We have to do the same thing in cybersecurity. Because at the end of the day: Choosing not to know doesn’t make risk go away.It just guarantees you’ll be unprepared when it shows up. The post Willful Ignorance as a Security Vulnerability appeared first on Cultivating Security.
-
1
Why Now? What 15 Years of Security Work Taught Me
Why I’m Writing This For the past few months, I’ve been writing more formal internal analysis pieces – breaking down incidents I see in threat intel feeds, public breach notifications, security news that crosses my desk. Nothing fancy, just trying to make sense of what I’m seeing and share it with my management team, my immediate team, and IT peers. Maybe a few others who want to read along. It started simple enough. I’d see an incident, write up what it meant for my company or our sector, what patterns I was recognizing. I’m technical – I speak technical – but I’m not exactly what you’d call a polished writer. About two years ago, I started using AI as a universal translator of sorts. I’d write something, then ask it to help me convert the technical bits for non-technical business people. Game changer. But here’s the thing – I spend more time checking AI output than most people realize. I read, re-read, make sure my actual message is still there. Too many people just hit send on whatever the AI spits out. Me? I’m making sure it’s saying what I want, how I want it said. AI helps me get what’s in my brain into better words, but the thinking – that’s still mine. Most of my emails get a once-over now for flow, clarity – did I actually answer the question I was asked, or did I just dump technical details and assume people would connect the dots? The analysis pieces got deeper. Started connecting more dots between incidents. About a year and a half ago, one of my staff pulled me aside and said I should be teaching this stuff – even suggested a platform for it. This wasn’t the first time I’d heard this – a few years earlier, another colleague had urged me to start teaching. A theme was developing. This latest suggestion came after he’d gone through Northwestern’s cyber bootcamp, so him asking me to teach wasn’t just enthusiasm – he’d seen the difference between academic frameworks and the real-world implementation of practical infosec I’d developed over the years. Me? A teacher? Recording various topics and having people pay me to learn real world practical implemented security? I wasn’t so sure. I’ve thought about it – hell, I’ve tried TikTok, tried YouTube for various side projects or for fun. I’ve learned that’s just not my medium right now. I know I’d overthink every word, re-record the same three-minute snippet ten times. But that conversation was the seed that ultimately sprouted into this. But it got me thinking about something else. The patterns I keep seeing – they’re not just credit union problems. Hell, I’ve been around long enough to see them everywhere, and from different angles. How I Ended Up Here Started out on a help desk at UPS back in Chicago when I was just a kid. Worked my way through networking, server support, end user support at IBM, Sears Holdings, picked up some coding along the way. I was all over the place – big companies, small companies, different roles. Ten years into my career, I’m sitting there thinking “what do I want to be when I grow up?” That’s when a recruiter called. After looking at my background, he said something like “You’ve done a lot – help desk, networking, servers, end user support. You have a very diverse background. Have you ever thought about information security?” Information security? Honestly, it hadn’t really crossed my mind as a career path. But that conversation was the start of falling down a rabbit hole I’ve never been able to climb out of – not that I’d want to. So I jumped in. Medium-sized property management company first, learning the ropes. Then back to Sears for a few years – and let me tell you, that was an education. Three breach investigations, exposure to governance, SecOps, incident response, analysis. Lot of battle scars from that place. Then ten years ago, I moved out here to the farm and got into financial services for the first time. To be honest, I was taken aback when I first got into financial services. The diversity of maturity blew me away – and actually made me feel pretty good about where we’d gotten to in retail. The big guys were well-funded with diverse security teams, but the smaller institutions were just trying to nail down the basics. And the vendors? I couldn’t believe how archaic their methodology and practices were. Fast forward nine years, and there’s been steady improvement sector-wide. But now we’ve got ransomware holding people hostage, ransomware disguised as a middle finger after data exfiltration. To a degree, some of this reminds me of the beginning of the retail siege – it’s just different in some minor ways. Same fundamental problems, though. Different scale, different consequences, but the patterns? Identical. The Patterns That Won’t Break And that’s what finally pushed me from writing internal analysis to putting this stuff out there. Because here’s what I keep coming back to: I’m tired of watching the same fundamental failures repeat across different organizations and timeframes. I know that sounds harsh, but I’ve been in security for about 15 years now – since 2009. I’ve managed breaches, sat through regulatory examinations that would make your hair curl, and had more vendor calls than I care to count. And you know what I keep seeing? The same patterns. The same failures. The same excuses. Take the CloudFlare outage a few months back. Great incident response, right? Some companies had backup plans, rerouted traffic, and got back online fast. Everyone patted themselves on the back for their business continuity planning. But I’m sitting here thinking – wait a minute. CloudFlare isn’t just a CDN for a lot of these companies. They provide WAF protection, DDoS mitigation, all sorts of security controls. Did the companies who could bypass CloudFlare actually think about what they just turned off? I wouldn’t be surprised if in a few months we see a breach notification here or there, and if there’s enough information released and we dig into it, it could stem from prioritizing getting back online over thinking through the security implications. Because I’ve seen this movie before. Here’s the thing – whether you’re at a credit union, a manufacturing company, or a mid-size retailer, you’re not Amazon or Microsoft. Most organizations don’t have unlimited budgets or teams of security engineers. When something goes wrong, it’s usually a small team trying to figure it out while keeping the business running. Different budgets, different management priorities, different regulatory requirements – or sometimes no regulatory obligations at all. But the same fundamental challenge: making security decisions with limited resources. And I’m guessing that’s a lot more common than the industry wants to admit. I can look at a proposed integration and see the Target data breach waiting to happen, just in a slightly different scale; but the basics are there. Not because I’m smarter than anyone else, but because I’ve been through the aftermath when they go wrong. I’ve seen vendors with network access that would make your head spin – and yeah, sometimes it’s literally HVAC vendors. Same industry, same access patterns, same blind spots that got Target in trouble back in 2013. And here’s what really gets me: we, as security practitioners, all read about Target. We all said “lessons learned.” We updated our vendor management policies, required network segmentation, implemented better monitoring. But that’s only if you have the support of the business, the budget, and you’re not hyperfocused on immediate needs. The latter unfortunately is what I believe is why we’re still seeing the exact same attack patterns work. When “Secure” Vendors Aren’t I’ve sat in calls with vendors – you know, those companies that are supposed to be more secure than yours because their Software as a Service, better funded, because they’re “cloud native” and “built with security from the ground up.” Due diligence paperwork looks ok, the SOC2 looks comprehensive, talking about all their controls and compliance frameworks. Then you get to the interactive technical questions, and sometimes they contradict what they gave in the Due diligence materials. Not minor discrepancies – fundamental differences in how they actually handle data encryption and access controls. I’ve heard everything from “we’re working toward that” to one memorable explanation that it was “aspirational.” Aspirational. Like security controls are a vision board. That’s when it hit me – we’re not just seeing the same technical patterns repeat. We’re seeing the same thinking patterns repeat. The same willingness to prioritize compliance theater over actual security. The same assumption that if you check the right boxes, you’re protected. And the security industry isn’t helping. Most of the content out there is either theoretical frameworks that assume unlimited budgets, or vendor-sponsored thought leadership that’s basically marketing with better grammar. Where’s the practical guidance for organizations like mine? Where’s the analysis that asks “why does this keep happening” instead of just “here’s what happened”? What Farming Taught Me About Security I’ve been thinking about this farming analogy a lot lately. Yeah, I know, stick with me here. I grow corn and soybeans on about 400 acres outside of town. Started when I moved out here from Chicago about eight years ago. And farming teaches you things that translate pretty directly to a lot of general life and other industries. You work with the land you have, not the land you wish you had. You can’t change your soil type or your climate, but you can understand them and work within those constraints. Some things take time to develop – you can’t rush soil health or crop rotation benefits. You have to plant before you can harvest. And if you don’t understand the patterns – weather patterns, market patterns, disease cycles – you’re going to struggle. Some farmers can till their ground, others opt for no-till even though they have the option, and others are forced into no-till due to soil conditions or operating on highly erodible land. Then there’s always that old-timer who’s been farming since the ’80s and won’t change because “we’ve always done it that way” – even when the science shows better approaches. Security practitioners fall into similar patterns. You’ve got those constrained by budgets and manpower, just like farmers forced into certain practices. Others who choose their approach based on what works for their environment. And then you’ve got the security equivalent of that old-timer – practitioners who want Fort Knox-level protection because that’s how they’ve always thought about security, even when it doesn’t help the business innovate or actually function. Security is exactly the same as farming in these ways. You can’t wish for unlimited budget or perfect vendors. You can’t implement “zero trust” overnight, no matter what the marketing materials say. You can’t skip the foundational work of understanding your environment and your threats. And if you don’t recognize the patterns that lead to failures, you’re going to keep experiencing those failures. Those farming principles have shaped how I approach security over 15 years: sustainable security isn’t about having perfect solutions, because there’s no one perfect solution to security or how to implement it. All of our industries are different, but the academic frameworks try to make us all the same. That’s not the case because the business, the industry, the day-to-day operations differ so dramatically. It’s about making better decisions with imperfect information and limited resources. It’s about pattern recognition that lets you see problems coming before they hit. It’s about practical risk management that acknowledges you can’t eliminate every threat, so you better understand which ones matter most. What’s Missing from Security Content That’s what’s been missing from the security conversation. We’ve got plenty of people selling solutions and pushing products. We’ve got researchers breaking down the technical details of every new attack. We’ve got compliance experts explaining the latest regulatory requirements. But who’s connecting the dots? Who’s asking why the same or simlar patterns keep repeating? Who’s providing practical guidance for organizations that can’t afford to replace their entire infrastructure every time a new threat emerges? I’ve been that person for the past 15 years – built two InfoSec programs from the ground up, been through data breaches, learned that saying “no” isn’t security’s role. I’ve gotten better at conveying risk instead of just being the roadblock the business sees me as. My last few audit cycles have become more about maturity – auditors having to write something rather than structural gaps because we’re no longer missing the basics. I don’t know everything, I’ve made mistakes, but I’ve learned that vendors don’t solve problems – they just rebrand when new buzzwords emerge. AI, zero trust, whatever’s next – none of it fixes a broken program, but some tools can help if you use your brain, question everything, and focus on how they actually break the patterns that keep making us vulnerable. And I’m tired of keeping that knowledge to myself. What Cultivating Security Will Be So here’s what Cultivating Security is going to be: practical security wisdom for organizations that operate in the real world. Pattern analysis that connects incidents to systemic problems. Vendor risk guidance that acknowledges you can’t always walk away from a problematic vendor. Operational security lessons that work with limited budgets and small teams. I’m not going to sell you anything. I’m not going to pretend I have all the answers. And I’m definitely not going to recycle vendor marketing as thought leadership. But I am going to ask the uncomfortable questions that need asking. Like why we’re still seeing Target-style attacks a decade later. Like what it really means when companies bypass security controls for business continuity. Like whether our security has actually improved, or whether we’ve just moved the vulnerabilities to shinier infrastructure. Because after 25 years in IT and Information Security, seeing the same patterns repeat, I think it’s time someone started documenting them. And explaining what they mean. And helping other security practitioners navigate them before they become incidents. Welcome to Cultivating Security. Let’s see if we can break some patterns. The post Why Now? What 15 Years of Security Work Taught Me appeared first on Cultivating Security.
We're indexing this podcast's transcripts for the first time — this can take a minute or two. We'll show results as soon as they're ready.
No matches for "" in this podcast's transcripts.
No topics indexed yet for this podcast.
Loading reviews...
ABOUT THIS SHOW
Deep examinations of industry incidents, vendor risk, and operational security decisions from 25+ years in the field. AI-narrated episodes transform written analysis into practical insights for security professionals who need to understand what really happens when security meets operational reality. No certifications required, just real-world experience.
HOSTED BY
Cultivating Security
CATEGORIES
Loading similar podcasts...