Facepalms Per Hour

My current velocity is sometimes measured in FPH – facepalms per hour.

This is a rant. Fair warning.

I guess the new ‘Millenial’ colloquialism for “grumpy” or “sarcastic” is “salty“. So I’m feeling extra salty this week. For several reasons. One, it’s audit season. Two, I had to churn out about a dozen new reports in the span of 4 days because the manager who was supposed to be tracking that project dropped the ball and forgot they were due by the end of this month until… yeah, last Friday. Wheeeeee!

Thus, I decided, my current ‘velocity’ (a SCRUM/DevOps term for “how much work are you getting done”) shall be measured in FPH – Facepalms Per Hour. Currently I’m at 3. Earlier this week I was approaching the double-digits, when the lovely report consumers kept thinking of “just one more little thing” they forgot about until after I’d delivered the ‘final’ product.

‘Final’ actually being a meaningful adjective in this context approximately NEVER.

facepalm original picard
The original gangsta.

Change Logs

How best to describe this scenario while still maintaining separation of “real job” from “blog land”… Hrm. So let’s say we have a CRM, like most companies. This stores customers, among other things, in a database. And since it also stores sales transactions and financials, it’s heavily audited — it has a lot of change-tracking mechanisms.

Now, auditors come along and want a report of some specific type of change over time. I happily oblige. Then… PANIC! And not at the disco. “What are all these changes to these customers by these users who don’t have permission to make said changes?!?”

K, calm down sparky. Try not to sound the alarm; auditors are a sensitive bunch.

Turns out, those changes are, in a word, “fake”. You see, there’s this background “customer sync” process that keeps them up to date with another part of the CRM where the actual changes were made. But, because it’s written poorly, it thinks that ANY field change, even just the Name or Address (which a lot more CSR’s, customer service reps, have the permission to change, because, you know, that’s their job), constitutes a change to the ENTIRE customer record on the other end. So the change tracker logs a change to every single field on the receiving end of that sync process, even though nothing really changed on the source side except maybe one or two fields.

With me so far? Great. So now the question is, “Well, can we get a report that doesn’t show those ‘fake’ changes?” But wait, it has to be “system generated” and you’re not allowed to “filter” or “add special exceptions” to it, because it still needs to be audit-able.

So what you’re saying is, give me a report that shows me what I care about, but you’re not allowed to change the logic behind said report.

Riiiiiiight.

So I give them a new report. I don’t explain how the sausage is made, I just make it and serve it up. “But why is this different from the original report?”

double facepalm

Well, do you want the audit-able answer, or the real answer? The audit-able answer is, “We made a system change that allowed us to prevent the ‘fake’ changes from being logged incorrectly.”

The real answer is, “B*tch, I AM the system!” — meaning yes, I excluded those with some hacky logic, and you need to stop asking questions about it.

Anyway. Change Logs are super fun.

Reports

Speaking of reporting. I could really go on for pages about how terrible and broken this whole system of “request-based report development” is. But it’s frankly all we have right now. Until there’s sufficient business buy-in to the concept of agile data warehousing and collaborative cross-functional data modeling, shit just comes in one funnel and goes out another with a little sparkle spackled to it. And we call it a report.

Example, you say? Sure! Let’s say we run a special sale on certain types of widgets every quarter. We want to track how these ‘specials’ perform — do they increase our sales of those widgets? By what factor, compared to the other not-on-sale widgets? Can we trend this over several quarters?

Oh but wait. The data structures that govern widget pricing and time-span-based sale pricing, and the logic that relates customer orders to what pricing structure they used at the time of ordering, is awful, terrible, and changes every time there’s a new quarterly promotional sale.

So you’re saying you want a report that trends sales of widgets based on arbitrarily changing promotional pricing as compared to other widgets that may or may not be subject to ‘normal’ pricing during that same time period, all without a simple definitive data-point that says “This is a Quarterly Promo sale, and That is Not.”

elrond's facepalm
Mister Anderson… I mean, wait.

Let’s try to get at the root of the problem, shall we? The business doesn’t seem to understand that the way they implement promo-sales is detrimental to long-term/comparative reporting. The data model makes this harder, not easier. Can we perhaps put some heads together and come up with a compromise that both A) makes more business sense, and B) improves the data model to be a bit more intuitive?

In Closing…

What’s your FPH? What causes you to facepalm on a regular basis? Let me know in the comments!  :o)

T-SQL Tuesday #115: Dear 20-year-old Self

Don’t be afraid of that big change, that big opportunity.

This month’s #tsql2sday is brought to you by Mohammad Darab, a relatively new #SQLSaturday speaker (congrats!) and all-around-great-guy. Feeling introspective, he invites us to write a letter to our 20 year old self. Quite coincidental that I’ve been in a “letter writing mood” lately on my other blog. ūüėČ

Dear Nate_the_College_Student

A couple things. First, DTB (rated PG-13 link). SRSLY. The high school girlfriend is NOT the one, nor does she treat you well at all. It’s gonna be over a year after you graduate, so do yourself a favor and break it off now so you can enjoy college more.

Second, study more, game less. It’s gotten ridiculous. You had straight A’s. You’ll get your first D EVER if you don’t take that network programming class seriously. And it will hurt. Your ego AND your GPA.

Finally! Hardware ain’t for you. Software is where it’s at. And data stuff. ALL the data! Someone’s gonna coin the phrase “data is the new oil”. Hey! That should be YOU! Do it.

Where the Rubber Meets the Road

Your first job is going to be amazing. You’ll spend NINE years at that company. Probably a bit too long. They weren’t very understanding when your wife was very sick and needed your help running around to doctors and pharmacies and such. But you’ll work some truly awesome people, connections that will last many years. That’s called ‘networking’ — making sure you stay in touch with those people. It will help open up future career opportunities.

Yes, I said “wife”. We’ll get to that in a bit.

Spend your 20s doing silly things, adventurous things, but focus on your career. If I had to do it all over again, I would branch out sooner, explore more opportunities earlier, and look at the world of tech beyond this little suburb. There is so much more out there. Not that there’s anything wrong with suburbia — our town is great. I love it. But it’s not a tech-job haven. So don’t be afraid of that big change, that big opportunity.

Because once you go for it, once you finally take that chance, you’ll be oh-so-much happier! Ride that wave of confidence and of learning new things. Start attending SQL Saturdays earlier, and look into other data-centric meetups and events. There is so much to learn, and never enough time.

Oh, and PS: be super careful driving. Please.

Love

Yes, you will find love. You will find the love of your life. The One. Your soulmate. When you least expect it. You will spend your latter 20s and early 30s having the best years of your life with her. She will be your everything. She is laughter, passion, heartache, support, grace, care, light, love, and life itself. Cherish every moment.

For our time on this earth is but a whisper on the winds of eternity.

A question that’s often asked of grievers is, if they knew what would happen, would they go back, change anything, do anything different. In this thought experiment, even — would I tell my past self what would happen to my wife? No, I don’t think I would. For even now, as I look back on the million little moments that we shared and loved and laughed and cried. I would do it all again in a heartbeat. Our love was once-in-a-lifetime. I know that in my soul. And I carry that flame in my heart. May it never be extinguished.

i love you 3000 with picture of wife inside
Always. ‚̧

Follow-up: Cribbage “15’s Counter”

The actual method involves joining 5 copies of the table together, by each right-side table only including cards with higher ID values than the table to its left.

To be honest, my T-SQL Tuesday puzzle was a bit of a last-minute idea, which is why I didn’t have a solution ready-made. But, dear reader, you’re in luck! I have one now.

The code is over here in Gist. You can read thru it, but since the final query — the actual “answer” — is kinda ugly, let me explain my thought process.

Modeling is Important

Even when I’m putting together a silly little demo script like this, I feel that good habits and fundamentals are important. You never know what future developer might read it, copy-paste it, and say to themselves “Cool, I’m gonna follow this example when I do this other thing over here!” So you’ll see my formatting preferences, naming convention (though I must admit, I argued with myself over whether to pluralize the table names or not!), and correctly allocated Primary Keys. And since we’re modeling a card deck, even though I didn’t need to store the ‘NumValue’ (which is what you’d use for a straight/run, where the Jack is 11, Queen is 12, etc.), I did anyway.

Now, when we set up our “Hands”, we’re going to use two ‘PlayerNum’s, just so we can test two different hands at the same time. Cribbage can be played with 3 or 4 players, but we’re keeping this simple. Also, I could have built the hands more aesthetically, i.e. by selecting from Cards using PtValue and Suit, but again, I was trying to script quickly, so I just used the IDs that I knew from the previous query (the “full deck”). And again, there’s a “little extra” tidbit, the ‘IsCut’ indicator — we won’t be using that right now. If you’re still not sure what that means, go read the rules.

The Method

At the end of the original post, I mentioned loops and cursors as possible routes to a solution. That may still be true, but I decided to challenge myself to avoid them. Not because they’re “always bad”, as popular media would have you believe; they’re just often an indicator that a developer isn’t thinking in set-theory¬†when they probably should be.

Let’s start with some basic principles. You have 5 cards in your hand. It takes a minimum of two cards to make 15 (examples include Jack+5, 6+9, etc.), and up to a maximum of.. you guessed it, five cards. So we need to check all combinations of any two, three, four, or five cards. We cannot re-use a card within the same combination; and putting the same three cards in a different order, for example, does NOT count as a separate combo (another ’15’).

So as you start to think about these rules, and if you’ve been around data for a while, especially data with identity¬†values, you might have a little light-bulb. “Aha! I know how to do that. We can simply order the combos by the ID value, and that way we won’t allow duplicates!” And that’s kinda what I did, by enforcing the JOIN predicates that every subsequent derived-table have a ‘CardID’ greater than the prior one. But I’m getting ahead of myself.

The actual method here involves JOINing 5 copies of the table together, mainly just on PlayerNum, but also, as I said, by each right-side table only including cards with higher ID values than the left-side. In this way, we ensure that we’re not allowing the same cards to be “joined” to each other, i.e. we’re removing them from the right-side tables.

And finally, we have four OR‘d conditions: simply “do any of those combinations add up to 15, by the Card’s PtValue?” These are echo’d in the CASE-expression in the SELECT line, where we want to essentially “show the combo”, i.e. tell you what cards make up the ’15’. (Again, for style’s sake, we have an ELSE, but we don’t really need it because it’ll never actually happen.)

Now, it does look kinda ugly. It’s not very extensible — meaning, if you wanted to scale it up to find the ’15’s in a 6- or 7-card hand, or you wanted to look for other kinds of combos (like ’18’s or ’27’s), you’d end up re-writing a good portion of it, or at least copy-pasting a lot. Fortunately for us, Cribbage is fairly simple in this regard — your hand is always the same size, and you only ever care about ’15’s.

(Well, and pairs, 3- and 4-of-a-kinds, straights, flushes, knobs, etc., but again, read the rules if you’re curious. We kept this very simple by limiting ourselves to just one small fraction of the game mechanics.)

The cool thing about this sample, though, at least to me, is that you’re already set up to build on it if you want to try out other Cribbage mechanics. Or even other card games, if you just use the base Suits & Cards.

What Did We Learn?

What’s the point of a puzzle like this? Well, besides introducing you to a fantastic card game, if you didn’t already know about it. The point is to make your brain think in a different way than usual. Are any of us programming card games using a SQL back-end? Probably not. (Although an in-memory equivalent like SQLite or something might be viable!) But the next time you have a “combinations problem” with some real-world data, you might wonder if a method like this could come in handy. Or at least, if it could work out better than a double-nested-loop. =)

PS: I believe, instead of the LEFT JOIN‚Äčs, we could have used OUTER APPLYs. We’d move the conditions from the JOINs into the inner WHERE¬†clause of each derived table, i.e. “this ID > previous ID” and “PlayerNums are equal”. If you’re curious, try it out!

T-SQL Tuesday #114: A Puzzle

One of the main things a new cribbage player needs to learn is how to easily spot the combos that make ‘a 15’ (the ways to combine cards to add up to a numeric value of 15). Let’s do that with SQL!

It’s that time again! The 2nd Tuesday of the month, T-SQL Tuesday. This month’s invitation is on the lighter side, which is nice, and it comes from Matthew McGiffen (b | t). The theme is “Puzzle Party!” And I’m going to cheat, since it’s getting horribly late already and I’m lacking in inspiration.

So, I propose a puzzle! Which you must solve using SQL. Then I’ll post my own solution in a day or two. Bwahahaha.

I actually really wanted to do a Sudoku solver, but @SQLRnnr beat me to it. By a few years. =P   I might still work on that when I’m bored, just to have a standby for another blog post. Maybe we’ll compare notes.

But for now…

Do You Even Cribbage, Bro?

If you’ve never heard of the card game cribbage, it might sound weird. When you read the rules, it sounds even weirder. Legend has it that it was invented by drunk Englishmen in a pub. Reality is actually not that far off. It’s also heavily played by Navy submariners, and that’s how it was passed down in my family.

There are already many great mobile & web versions of the game, and it will quickly become obvious to anyone who’s tried to program a card game before, that a query language like T-SQL is NOT suited (omg see what I did there?) to the task. However, we can probably come up with a small sub-task of the game that’s acceptable for our purposes.

Enter: the hand scorer. There’s a nice example of a finished product here. The input would be a set of 5 ‘cards’ — the ‘hand’ has 4, and the ‘cut’ adds 1 more, used as part of each player’s hand in scoring (like community property). A ‘card’ is simply an alphanumeric value — 1-10 plus JQK (which are ‘worth’ 10 for arithmetic, but can be used like normal for ‘straights’ aka ‘runs’) — and a ‘suit’ (heart, spade, diamond, club). Think for a moment on how you’d store that as a data structure.

The output, then, is a single numeric value, the ‘score’. But how do you score? You look for the following: combinations of any numeric values that add up to 15; pairs, 3-of-a-kinds, or 4-of-a-kinds; straights (suit does not matter); a flush, if all 4 ‘hand’ cards are the same suit (and a bonus point if the ‘cut’ card matches as well). And then there’s a funky thing where you get an extra point if you have a Jack that matches the suite of the ‘cut’ card. o_@

Dude… What?

Wow, that sounds complicated, no? Let’s make it simpler. One of the main things a new cribbage player needs to learn is how to easily spot the combos that make ‘a 15′ (the ways to combine cards to add up to a numeric value of 15). For each ’15’ you make, you score 2 points. That sounds pretty feasible in SQL, right?

For starters, we don’t really care about suit anymore. But we do need some way to distinguish the cards from each other. This is a single-deck game, so you’re never going to have more than 4 of the same number; never more than one of the same card (like the Ace of Spaces). And when you’re counting combinations (or is it permutations?), you can’t use the same card twice. So let’s still use the suits for card distinction; I’ll just suffix the number with an ‘h’, ‘s’, ‘d’, or ‘c’.

We also don’t care about differentiating a 10 or J/Q/K, since they’re all just worth 10, numerically. So your ‘input’ can just consist of five numbers between 1 and 10. Cool? Just find the ’15’s!

Example:

  • Your hand is 3h, 6s, 6d, 9c, and the ‘cut’ is 3c.
  • Combos for ’15’: 6s+9c, 6d+9c, 3h+3c+9c, 3h+6s+6d, 3c+6s+6d.

That’s five unique combos, for a total of 10 points! Good job, that’s a bit better than average hand. In cribbage lingo, you’d say it like so: “fifteen two, fifteen four, fifteen six, fifteen eight, and fifteen ten.” Or if you’re playing with more experience, you’d abbreviate to simply “two four six eight ten”.

In “normal” programming land, we’d probably use a loop and some branching logic. What will we do in SQL? A loop, a cursor, or something more (or less!) elegant? You decide!

I’ll come up with something solution-y soon. Update: Solution posted! Enjoy! ‚̧

cribbage board close-up of winning peg and partial hand
Red won by 2 points! Close game.

T-SQL Tuesday #113: Personal-Use Databases

So when I dived down the rabbit-hole of the Nested Set Model, of course I created a sample database to write & test the code against.

tsql2sday150x150It’s that time again! This month, Todd Kleinhans (b/t) asks us how we use databases in our day to day life, i.e. personal use or “outside of our work / day-job”. Actually, the question is kinda vague — because if you think about it, we all¬†use TONS of databases in our daily lives. Your phone’s contact list, your calendar, online shopping, banking.. the list goes on. As I’ve said before, Data is everything.

But what I think he meant, and the way most of the community has interpreted it, is “How do you¬†manage/administrate/build/work-with/develop databases in your day-to-day life outside of work?”. So we’ll go with that.

Now this may out me as “not a¬†real DBA” or some such nonsense, but honestly.. I don’t spend much of my time creating silly playground databases. Not that anybody else’s are ‘silly’ — just look at some of the fantastic posts for this month! Such neat ideas brought to life.

Special shout-out to Kenneth Fisher, who, if you look closely at his screenshot (and it’s not even related to this post), committed the abhorrent sin of creating a database name of pure emojis — FOR SHAME sir! But also you’re awesome. ‚̧

Me, I’m more of a quick-n-dirty spreadsheet guy. If I need, say, an inventory of my computer parts & gadgets so I know what I can & can’t repair, what materials I have to work with as I tinker, etc.. well, I create a Google Sheet. And it serves my needs well enough. (Spoiler alert: yes, you can view that one; I shared it. But it’s fairly outdated since I moved in March and haven’t had time to re-do inventory since.. last autumn.)

But for blogging in the tech field, you gotta get your hands dirty. So when I dived down the rabbit-hole of the Nested Set Model, of course I created a sample database to write & test the code against. And there have been some additional bits & pieces for blog demos and GitHub samples.

Most of the time, I’m creating databases / entities on SQL 2016 Developer Edition. Of course by now, that’s 2 major versions ‘behind’, but since I don’t run Linux personally (yet?),¬†and I’m not a conference speaker (yet??),¬†I don’t feel a burning need to upgrade. It’s FAR superior to Express Edition, though, so please for the love of all that is holy, if you find yourself using Express for personal/playground use, save yourself the headache and go grab Developer.

Containers/Docker? Meh. If you want to start playing with those, definitely look at 2017 or higher. It sounds appealing in theory — “just spin it up when you need it, spin it down when you don’t!” — and I’m sure that’s great if you’re starved for resources on whatever laptop you’re working with, but if you’ve done your due diligence¬†and set your local SQL instance to appropriate resource limitations (hello, ‘max server memory’ and file-growths!), I’ve found that its impact is quite tolerable.

But come now. Surely this isn’t just a “shameless self-promotion” post or a grumpy-old-DBA “get off my lawn” post. Right?? Right!

To you folks out there creating your own nifty little databases for personal projects, learning/development, or even hopes & dreams of building a killer app on top of it one day —¬†you’re amazing! Keep doing what you do, and write about it, because I love reading about it. Heck, go try to create the same model in PostgreSQL or MariaDB and see how it goes. We could all use a little cross-stack exposure once in a while.

That’s all I have for this month; short & sweet. I need to finalize plans for virtualizing¬†our main SQL instances (which is really just a migration off bare-metal & onto VMs) within the coming weeks. Yes, we’re that far behind the curve. Now get off my lawn!

=P

clint eastwood frowning angrily
I’m old and racist! But I’m still adorable for some reason!

T-SQL Tuesday #112: Cookies!!

..this analogy of “dipping into the cookie jar”. What events or accomplishments can I take sustenance from, draw strength from, during these times?

tsql2sday150x150

Hi folks. It’s been a minute. Frankly it’s been a rough 5 months. From losing my wife, to dealing with the holidays and her birthday, to moving houses again. On the career front, I’m faced with the challenge of virtualizing our core business-critical SQL instances with minimal downtime. And obviously, because of all that personal/life stuff, it’s been difficult to stay focused and productive.

So this #tsql2sday‘s topic is poignant, I suppose — this analogy of “dipping into the cookie jar”. What events or accomplishments can I take sustenance from, draw strength from, during these times?

As usual, we must thank our host, Shane O’Neill (b|t), and remind readers to check out tsqltuesday.com.

Encouragement

Back when I first took this current job, I was worried sick about doing the commute every day. One and a half to two hours in traffic each way. Even if I used toll-roads, it might save 10-15 minutes, but it was still super stressful. But my wife never stopped encouraging me, telling me it would pay off. She put up with the crazy hours, she checked on me periodically, she stayed on the phone to keep me awake sometimes. She reminded me often, when the time was right, to have the telecommute/remote-work conversation with management.

And of course, to nobody’s surprise, she was right. I now work from home 4 days a week, take a vanpool the 5th day, and am much happier and more productive (in general!), much less stressed, and a markedly better driver. More importantly, because of her unwavering support, I can still look back and draw renewed energy from those memories and from her still-present spirit that stays with me in my heart.

the wife is always right
GOTO 1

Accomplishment

One of the first big projects on my plate was upgrading & migrating the SQL 2005 instances to new hardware and SQL 2016. We didn’t use anything super fancy for the actual migration, just backup shipping, essentially. DbaTools came in very handy for copying over the logins/users without having to hash/unhash passwords. The main hiccups were around Agent Jobs and Replication. Jobs were turned off at the old server before they were enabled on the new, but due to lack of documentation and understanding of some, it became difficult to see which ones were critically needed “now” vs. which could wait a while, and which may have dependencies on other SQL instances (via linked-servers) that may or may not have moved.

And replication is just a PITA in general. Fortunately, I took this as a ‘growth opportunity’ to more fully document and understand the replication environment we were dealing with. So at the end of the project, we had a full list of replication pub-subs with their articles, a sense of how long they each took to re-initialize, and a decision-process to consult when adding-to or updating articles within them.

Continuous Learning

A similar upgrade-migration project came to fruition in 2017: the ERP system upgrade. This was a delicious combo meal of both the database instance upgrade (SQL 2008R2 to 2016) and the application & related services/middleware (Dynamics NAV 2009 to 2017). And as I blogged about, it came with a super sticky horrible side-effect that we’re still dealing with over a year later: a different collation than the rest of our DB environment.

Which reminds me, I need to do some follow-up posts to that. Briefly, the “best” band-aids so far have been thus:

  1. If only dealing with the ERP entities, without joins to other DBs, just leave collations alone. The presentation layer won’t care (whether that SSRS or a .NET app).
  2. If relating (joining) ERP entities to other DB entities, I’ll load the ERP data into temp-tables that use the other DB’s collation, then join to those temp-tables. The “conversion” is essentially ‘free on write’, meaning when we load the weird-collation data into a temp-table that’s using the normal-collation, there’s no penalty for it.

As I said, I’ll try to dive into this more in a future post. It’s been a life-saver for performance problems that cropped up as a result of the upgrade & the different collations.

But the point here is that, even though this project didn’t end up as wildly successful as we’d hoped, it’s still a success, because we learned some key lessons and were able to pivot effectively to address the problems in a reasonable way. And frankly, there was no going back anyway; it’s not like the business would have said “Oh, never mind, we’ll just stick with the old versions of everything” (even though some reticent managers would probably have enjoyed that!). So even when things seem bleak, there’s always a way around.

when the going gets tough, the tough take a coffee break
Wait a minute…

Conclusion

I’m still trying to figure out what this new chapter of my life looks like, without her. I’m still trying to be the very best DBA & BI-Dev I can, supporting the dozens of requests & projects that the business throws at us. Fortunately, with the incredible SQL Community, a wonderfully supportive family, and a fantastic team of colleagues, I remember how far I’ve come, on whose shoulders I stand, and how much promise the future holds.

Even though the one person I was meant to share it all with is gone; she still smiles down and encourages me in subtle ways.

…with love & light ‚̧

5 Things I Learned at SQLSaturday

Go find a SQL Saturday near you, at sqlsaturday.com!

The weekend before last, I attended my 4th #SQLSaturday event; my 2nd in San Diego CA (the others were Orange County CA, which is equally fantastic, and a little closer to work, though about the same distance from home). If you haven’t heard of or been to one, check out the home page and find one coming to a city near you! They’re fabulous FREE training events for the MS data platform, including but certainly not limited to SQL Server. For example, you’ll almost always find Kevin Feasel talking about aRrr or Kafka or Hadoop.

Did I mention free?

So I thought I’d share a few things that I learned this time!

The LinkedIn app’s Killer Feature

Did you know? The LinkedIn app has a “find nearby” feature that uses magic your phone’s various radios to instantly connect you with a fellow user who has the app open near you. It’s awesome! Now you don’t even have to look up from your convention coffee and security-blanket (phone) to network — just fire up the app, go to the People tab, hit “Find Nearby”, and commence trolling. =P

No, that’s horrible; be a normal human and talk to people. The tech is just there to help make the post-conversation connection.

linked-in find-nearby button

Storage Myths Busted

This was an interesting and even slightly entertaining session presented by Max @ SQLHA. One analogy that really stood out to me was this:

SANs have become a bit like the printer industry — You don’t pay a lot for the enclosure, the device itself, i.e. the SAN box & software; but you pay through the nose for ‘refills’, i.e. the drives that your SAN vendor gods deem worthy of their enclosure.

It’s frighteningly accurate. Ask your storage admin what it costs to add a single drive (or pair of drives, if you’re using something with built-in redundancy) to your SAN. Then compare that cost with the same exact drive off the retail market. It’s highway robbery. And we’re letting them get away with it because we can’t evolve fast enough to take advantage of storage virtualization tech (S2D, SOFS, RDMA) that effectively makes servers with locally attached SSDs a superior architecture. (As long as they’re not using a horribly outdated interface like SAS!)

Data Protection and Privacy is Hard

But completely necessary. We all need to become security practitioners to some extent. Even if it just means raising and documenting our concerns to our bosses. The great state of California has even jumped on the bandwagon with its very own privacy act. Still in the early stages, yet on the heels of GDPR, it can only mean that there will be more to come.

A few concrete action items from this, which should be “fairly simple” (with a big ol’ asterisk that says “depending on your organization and your change-management process”).

what if i told you you don't need 'sa'
For anything. Ever.

  1. At least encrypt your database backups.¬†(And make a plan to implement some kind of “full” encryption across the data estate, be it TDE or AE or app-driven encryption via your developer teams.)
  2. Stop using sa! Reset the password, and disable it. Yes, your Agent Jobs will still run just fine.
  3. Disable Named Pipes & Shared Memory protocols; just use TCP/IP. Disable the SQL Browser service.
  4. Cut off your SQL servers from the public Internet (yes, you should still patch them; just download the patches to a fileshare and install them using PowerShell!). And stop letting people run SSMS on the server; that’s what client machines are for!

Columnstore All The Things!

Seriously. If you’re not using them yet, read about them, play with them, and start using them. They’re magic.

Okay, that’s a bit dramatic. As with any technology and feature, you need to know the WHY. Understand what the best use-cases are and how that translates to your own environment.

columnstore all the tables
Easy there sparky…

Here are just a few of the tips I gleaned from the session on this:

  • They were designed for data warehouses, but…
  • They’re also great for “operational analytics” — where you want to do aggregate reporting on your ‘live’ data, but that performance usually kinda sucks (and you don’t want to lock up those tables anyway).
  • Best with SQL 2016 or higher; 2012’s “v1” implementation was horrible, and 2014’s “v2” was semi-usable but still had some major drawbacks
  • Best candidate tables are “very large” (millions of rows or more), and best candidate columns have “low cardinality”, meaning they’re not full of unique values — they should be “compressible”. A simple example would be a Customer’s “State of residence” — you probably have millions of customers, but only 50-ish “State”s, and your typical report is asking “Hey, how many Customers per State ‘do X'” — that’s a great candidate for a columnstore¬†index.

Users Don’t Like Date-Pickers

I mean they’re still better than text-entry boxes, but we can do better. Talking about SSRS here — reporting services, i.e. “the poor-man’s Tableau”.

Picture a typical business user, middle-manager type, going to your SSRS report that you just built for him/her. The first thing it asks them to do is “pick a start-date and an end-date” to define the “reporting period”. But you should know by now that they almost always want to see the current Fiscal Quarter. So you default them to those dates that define the current quarter. Great!

Now they want to quickly compare to the previous quarter, or the same quarter of the previous Fiscal Year. Ruh-roh. Nobody wants to go messing with those lame date-pickers with the pop-up calendar thing.

Give them a clickable label instead, which says “Compare to last Quarter” or “Previous Fiscal Year”.

The click action should be to “drill through” to the same report, while changing the (now internal/hidden) date parameters to the appropriate dates. Presto! The user only had to click once to get exactly what they wanted. Much nicer experience.

I’ll try to have a future post going into detail on this. I’ve got tons of ideas swimming around in my head after FishHeadTed‘s excellent SSRS classes, and not nearly enough time in the day to flesh them out.

i see what you did there?
Get it? Swimming, fish?!?

Stay tuned, and go find a SQLSaturday near you!

Career Day: SQL DBA

Build a solid CS/IT foundation, build your soft-skills, and then learn the latest & greatest tech stack. Or the mature enterprise stuff. There’s plenty of room for both!

Our benevolent overlord¬†prompted us (months ago; I know, I’m always late to the party) to write about what we do in our day-to-day jobs as SQL Server professionals, aka #SQLCareer. The actual idea is to do 4 posts in a row, for 4 working days (either adjacently or the same day each week for 4 weeks).

Sticklers, us? Nope, never.

Without any kind of future guarantees, here is my first.

Morning Mash-up

Emails, tickets, and alerts, oh my! Today was actually pretty quiet on the monitoring front — no major performance issues, no interruptions, and no job failures. That one that occasionally takes 8+ hours did okay last night. More often than not, I have to kill it if I find it’s been running into the work-day. That should be automated. /backlogged

Re-wrote some analyst’s data change script to be more “friendly” and set-based, instead relying on identity values — which is important when they can differ between Dev & Production environments. Deployed a few change requests, including that one.

virtualize all the things
Even SQL? Yes! Especially SQL. Just know the caveats.

On the side of less tedium, I fleshed-out more of the plan to virtualize our remaining bare-metal SQL servers. The usual ‘gotchas’ have to be discussed with the Server Admins — reserving compute resources, preventing the ‘noisy neighbor’ problem, and having enough storage to do the migration.

Yes, that means about 2x the storage as they’re currently using. Got a problem with that? No? Good. ūüėČ

Finally, I worked on some code for querying the SSRS ReportServer¬†database to obtain report metadata (including stored-proc usage, parameters, and user activity). The core concepts came from Ted Stathakis‘s SQLSaturday¬†session on SSRS 2016. Hopefully, some of that code will be coming to GitHub soon.

Afternoon Delight

“Delight” being sarcastic. No really, I love my job, but sometimes there are just some things that make you go ‘grr arg’.

First up, developing and finalizing some “data cleanup effort” reports. These are things that show the business users “what’s wrong with the data” and give them some guidance on how to “fix it”. Now, because there is no easy “management GUI” for this data, and because actual changes need to go through change-control, it’s not going to be a cake-walk. But given the right inputs and some domain knowledge, they can make the decisions as to what those changes should be, and then hand them off to the BA’s (business analysts). Usually in Excel form.

Excel spreadsheet with woman screaming in front of it
What? It’s under 1,048,576 rows! Barely…

Next we have the ‘grr’ part. This is where I spent about 40 minutes trying to explain to the 3rd-party software vendor exactly where and how to integrate with our core data systems. Most of the info should not be ‘new’ to them, but since I never know which support tech will be assigned to the issue, it’s best to err on the side of verbosity. To make things more confusing, the folks on my end, who have been interfacing with the vendor thus far, aren’t intimately familiar with the underlying technology bits, so the conversation gets a little dicey before we all arrive on the same page.

Fortunately, since I had their attention, I was able to sneak in a suggestion to upgrade the back-end database for the software (which we self-host, but is essentially managed by them & their software) to something from this decade. MySQL 5.0 is old, guys, mmkay?

Wrapping Up

A developer needs a new table & some initial data-load to production; it’s passed QA. Great! They wrote the query a little weird, but nothing ol’ SQLPrompt can’t fix.

Commit outstanding stored-proc code & RDL files (SSRS reports) to source control. Even if I’m the only one working on them, I’d rather they be safe & track-able.

Ruh-roh.¬†A developer set off some proc in a Dev db, it’s been running for over 30 minutes… and then they cancelled it! But wait, it’s stuck in rollback. Yes, kids, rolling-back a transaction can take even longer than the transaction itself, due to rollbacks being single-threaded.

Now, since the user’s hitting of the ‘Cancel’ button (aka “Stop” in SSMS) does not wholly and entirely cause a kill SPID¬†command, I do that for them (since only sysadmin¬†or processadmin can kill sessions). Then I run ye olde kill 64 with statusonly¬†(64 happened to be the SPID) to check on its rollback status. Good news, everyone! It’s at… 0%.

bad news nobody
You heard me.

And it stays at 0%. For fifteen minutes.

I move on to other tasks of course, since there’s nothing more I can do (short of pulling the plug on the server, which is never¬†a good idea (unless you’re testing your Disaster Recovery Hope plan!). We also create a ticket to fix this stored-proc so that it doesn’t “take forever”, even if it has to churn through millions of records.

Finally, the statusonly¬†check starts moving. It’s about 45% rolled back, with about “300 seconds remaining”. This, as anybody who’s been a DBA will tell you, is in “Microsoft time”. You know those progress-bars you typically get in Windows while something is happening? They’ll start off fast, make it to around 90-some percent, tell you there’s about 1 minute remaining, then take at least five minutes to “finish that last little bit”? Yeah, that’s a “Microsoft minute”. Patent-pending.

But fortunately for us, it does finish in just about the time it promised, maybe a bit longer. Just in time for the Dev to test what they wanted to test before going home for the day. And just in time for me to have my evening not ruined by a runaway transaction.

PS: Anybody who’s ever worked with SQL, including yours truly, has done this (set off a long transaction and/or tried to rollback a long transaction) at least once in their career. If you ask a seasoned DBA or DB-Dev, and they say “No, I’ve never done that!”, they’re a dirty rotten liar.¬† =D

Are You Not Entertained?

If the life of a DBA sounds like fun to you, get in touch with your local IT recruiter and find out what the job market is like! You’ll hear all sorts of doom & gloom about how “the Cloud is taking over everything” and “the DBA is dead; long live DevSecDataOpsUnicorns!”.

No, I joke. Yes, some of the fundamental responsibilities of the DBA are shifting to the cloud vendors, to IaaS/PaaS services, but at the end of the day, being in IT is about learning tech and working with people. Build a solid CS/IT foundation, build your soft-skills, and then learn the latest & greatest tech stack. Or the mature enterprise stuff. There’s plenty of room for both!

tee-shirt that says "tough enough to be a dba, crazy enough to love it"
We end on a happy note.

T-SQL Tuesday 106: 5 Things Not to Do With Triggers

Here are a handful of anti-patterns that I’ve seen with triggers in my time…

This month’s invite comes once again from our benevolent overlord* community pillar¬†Steve Jones, head of SQLServerCentral.com! If you haven’t been there, stop reading immediately and go check out the helpful forums and ‘Stairways’ articles. Some truly excellent content to be had.

No, don’t stop reading immediately… save it to your favorites/reading-list and look at it in 5 minutes when you’re done here.¬† =)

*Though I’ve not had the pleasure of meeting him in person, I’ve heard Steve is a phenomenally humble and down-to-earth guy, so my silly comment about him is even sillier in that light. ‚̧

Triggers – Love ’em or Hate ’em

Borrowing a bit of a mini-game from the CodingBlocks guys, the first “Google-feud” (auto-complete result) for “sql server triggers are ” is sql server are triggers bad.¬† Well, they can be, if used improperly / to excess.¬† Like most bits of tech, they have their place and their use-cases.

I’ve blogged about a few such use-cases here¬†(a “who did what” audit-trail type of trigger) and here¬†(as part of the Nested Set Model implementation), which you should definitely read. I didn’t condone click-baity titles back then. Alas…

click this right now omg
All we need now is a “blink” tag…

Here are handful of anti-patterns that I’ve seen with triggers in my time.

Thing 1: Using them as Queues

Repeat after me:

A trigger is not a queue.

Triggers are executed within the same transaction as the query that fires them. Meaning, they’re subject to all that ACID-y goodness of a transactional system. This is a double-edged sword. If all is successful, it guarantees that trigger will do its job when the calling query runs, which is great for audit-ability. On the other hand, if the trigger has a problem, anything and everything that triggers it will also fail.

The fundamental distinction between this and a queue, is that the success of the queued action is not immediately critical to the continued operation of the thing that called (queued) it.

So if your requirement matches the latter behavior, and not the former, do us all a favor and use a real queue. Heck, find one of the few people who know Service Broker. (Hint: one’s a Warewolf, another’s a Poolboy.)

Thing 2: Making them WAY TOO BIG

Mostly because of the transactional thing, the rule of thumb with triggers is K.I.S.S. “Keep it Small and Simple.” Even the audit-trail example is a bit much if the table being audited is under significant write load. Typically, if the business requirements are both high-throughput and high audit-ability, you’ll be implementing a much more complicated tech-stack than just plain ol’ SQL Server with triggers.

Some of the offenders I’ve seen include: A trigger that wanted to write to several other tables with IF conditions that branched based on what column was being updated. And a trigger that required near-SA level permissions to do some back-end maintenance-y stuff. Those are both recipes for problem pie.

pie that looks like a kraken attacking a boat
We’re gonna need a bigger… fork?

Thing 3: Doing Nothing Useful

Somewhat opposite of above, there’s no point in introducing the management and performance overhead of triggers if they’re not performing a transaction-critical operation. For instance, something better left to a queue.

Thing 4: Housing Business Logic

There’s always been a holy war about whether “business logic” belongs in the database or in the application code. And, since geeks love sub-classifying things to the Nth degree, there’s a sub-holy-war about what “business logic” actually means. But I won’t go into that here.

If you fall on the 1st side of the fence, and you feel the desperate desire to embed some logic in the data layer, it belongs in stored procedures, not in triggers. Reasons include maintaintability, discoverability, performance, documentability. Not to mention source-control on procs is a helluva lot easier than on triggers.

Thing 5: Too Many of Them

there's too many of them
Stay on target..

While multiple triggers can be defined for the same action on the same table, that’s not an invitation. You enforce trigger execution order to an extent (first and last), but any more than that and you’re asking for confusion. Falling back on the KISS principle, if you need more than one trigger on a table & action (insert, update, or delete), you probably need to rethink the underlying design.

Bonus

Using INSTEAD OF vs. AFTER: it’s fairly self-explanatory, but just be aware of what you’re doing, especially with the former. You’re literally replacing the desired action of, say, an update query with the contents of your instead of update trigger. If this is not obvious to all users of this table, you’re in for some really surprised faces and angry messages.

And that’s all I have for today folks! Enjoy triggering stuff. In moderation, as all things. =)

SQL Style Guide – a Rebuttal

Both the original article, and my responses, are guidelines, meant to help you start a conversation with your own team about how you can get better at writing code.

In what will surely be a controversial post, I give my take on some of the major points of this “SQL style guide” that made the rounds on Hacker News / Reddit recently a few years ago.¬† Hey, I never claimed to be a source of breaking news.

stamp of controversy
Feels so empty, without me…

Now remember kids, these are opinions — everyone’s entitled to theirs, and it doesn’t mean you’re right or wrong.¬† As the author repeatedly points out, consistency among a team / project / environment is more important than anything else.¬† Both the original article, and my responses, are guidelines, written by people with some experience (guided by others with more experience) to help you start a conversation with your own team about how you can get better at writing code.¬† Because that’s what we’re paid to do, among other things.

Basics

I agree with most of the points here:

  • Consistent descriptive names (I would add “concise” too — autocomplete/intellisense has come a long way, but we’re still typing stuff sometimes)
  • White space & indentation
  • Datetime literals – hail Saint Bertrand
  • Comments – you, dear reader, know my thoughts already
  • Some OOP principles should be avoided because they usually lead to poor performance
  • Hungarian notation is best left in the ’80s where it belongs

Something I don’t abide by:

  • Sticking to the ‘standard’ and void vendor-specific functions/features

Some brief justification for my rejection:

Database code portability is largely a myth — it sounds great, in theory, but most teams/projects never actually do it.¬† If your data persistence layer is ever slated for migration to a new tech-stack, it’s going to require a massive overhaul anyway, in which those vendor-specific functions/code-bits will the least of your worries.¬† More likely, you’ll be swapping-out pieces of the data layer in small chunks to move into other storage tech, like a NoSQL store or a DocumentDB or something; and eventually over time, the whole original data layer will have been moved, and the concern over SQL code portability will be moot.

Furthermore, database vendors give you these features/functions for a reason — they’ve found that it greatly enhances their product and the productivity of developers who work with it.¬† So why not take advantage?

Finally, if your application is ‘cutting-edge’ enough that¬†ALL¬†db access is done via ORM or some kind of repository layer in the code-base… guess what?¬† You don’t have this problem in the first place!¬† Because the database is a dumb state storage mechanism, containing little to no actual code whatsoever (storec procs, functions, etc.).¬† So, port away!

now that's what i call edgy
I can almost feel the edgy-ness…

Other basic issues:

  • CamelCase (actually TitleCase) is pretty standard in a lot of DB schemas, and I see nothing wrong with it.¬† Despite my love of underscores (snake_case) , it does make for more awkward typing.
  • Plural or singular entity names should match the convention preferred by your overlaying ORM, if you’re at the point of DB design; most of the time, though, you’re working with a DB that you’ve inherited and you have no control over entity naming anyway, so stop whining about it and get on with life.
  • Leading vs. trailing commas: I prefer leading, but the arguments against it can sound convincing (not just his, but in general in the tech community) — my experience leans toward being more likely to futz with the middle-to-end of a list than the beginning (1st item), thus making the leading commas more likely to help, but that’s just me. Also, thanks to an awesome member of the Coding Blocks Slack, a point in my favor is that source code comparison (diff) tools will only show the one changed line instead of two, if you’ve had to add to the end of the column list.

Naming Conventions

Yes, please avoid reserved keywords, replace spaces with underscores (or use TitleCase to avoid having spaces), and use concise yet meaningful table aliases when you’re writing queries.¬† I still remember, when I first started working at my current company, literally gasping and cursing under my breath when I found that some databases actually had a space in the name.

Beyond that, the article goes a bit too deep in the weeds for me, especially the whole “known suffixes” thing — because isn’t that just Hungarian notation on the other end?¬† How about names that make it intuitive, such as IsActive¬†for a bit flag, or LineNumber¬†or RecordSequence¬†for a sequential integer that’s not auto-generated (not an identity¬†value), or @NumMonths¬†as a parameter for a stored-proc that indicates how many months of reporting to fetch?¬† Common sense should rule the day, not arcane prefix/suffix conventions that will never be documented or enforced.

White Space

This whole notion of a “river” feels strange and awkward.¬† It’s made worse by the fact that some clause’s keywords are “too large” for the “standard” river width (which is the width of the SELECT¬†keyword, ish), such as group by¬†and left join).¬† Plus, I’ve yet to see truly excellent tooling support for this kind of style (be it VSCode, Visual Studio, SSMS, SQL Prompt, or other styling tools / auto-formatters).¬† Given that I still largely write my code without continuous automatic re-style-on-the-fly styling assistance, I find this hard to digest.

Side-bar: big kudos to the author for pointing me at this¬†typography article, which challenged my long-ingrained writing preference of double-spacing between sentences.¬† Even now, I do it while I write this post, knowing it’s wrong — I can’t help myself!

For similar reasons, JOINs and sub-queries don’t need to be “on the other side of the river” — since I’m actually saying “there is no river”, what I mean is, don’t indent those JOINs or sub-queries excessively.¬† In the FROM¬†clause, the JOINed tables are just as important as the first one, so I don’t see the need to reduce their importance by putting them so far to the right.¬† And please for the love of all things holy, stop putting the JOIN¬†predicates in-line (on the same line) after the joined table — put the ON¬†conditions to their own line and indent it!

Sub-queries are a strange animal, and I won’t dive deep on it here.¬† Basically, I try to style them the same as I would normally, just indented to the context of their scope; and within Javascript-style parentheses — meaning, open-paren on the preceding line (or its own line), then the body, then close-paren on its own line to finish.

Special note about the “Preferred formalities” section

BETWEEN¬†is mostly evil.¬† I’m not saying you should never use it; just be very clear about why you’re using it, and only use it with discrete valued types (DATE, INT), NOT with continuous (or conceptually/nearly-continuous) value types (DATETIME, REAL/NUMERIC/DECIMAL).

The UNION¬†operator is often misused, usually because UNION ALL¬†is preferred (and is what you really meant anyway), but a blanket statement to “avoid it” misses the point of why it exists in the first place.¬† Likewise, temporary tables (#temptables) are wonderful tools when used properly and in moderation, but flagrant overuse can lead to what I call #tempocalypse¬†(which means you’re hammering your TempDB¬†so hard that its underlying storage system screams for mercy).

cry for help cheeto puffs
Like Cheeto Puffs, your TempDB has become bloated and full of cheeze.

Misnamed section “Create syntax”

What he really does here is expound upon table and database design principles.¬† This does not belong in a “Style Guide”; it probably belongs in a “Design Guide”, but because the relational database as a technology is so mature (yes, that means ‘old’) by this point, most of this feels completely unnecessary and redundant.¬† And again, you’re working with inherited designs over 90% of the time, where you don’t get to make these decisions, because they were made for you by your predecessors.¬† If you are so lucky as to be designing a relational model from scratch, look for advice from the tried-and-true architects of the trade.

I do echo and reiterate his advice to look at other data storage tech for things that are not ideally served by an RDBMS, such as EAV models, document storage, key-value storage, text search, etc.  There is no shortage of NoSQL tech to meet those needs, and they will do a much better job of it than you or I could in designing a hacked-up SQL model that somewhat does the job but falls over flat when faced with scaling up to a larger size or heavier workload.

In Conclusion

As I said at the beginning, these are opinions.¬† I applaud the author’s effort, and the fact that he actually got something together, made it open-source on GitHub, asked for feedback, and actually got traction with several communities discussing it.¬† That’s more than I can say for myself at this point!

I hope that this spurs some discussion on your database development team, and perhaps helps you work toward making a consistent style guide for your own environment.¬† Even if it’s just water-cooler talk, it’s always good to remember why we write code: so that others can read it.¬† (Sure, it’s also to make stuff happen, i.e. “make the computer do things”, but that’s a side-effect — your main goal should always be readability and clarity.)

Do you have some comments, thoughts, disagree? Leave me a comment! I’d love to hear from you.¬† ūüôā

needs-moar-drama
Drama-Cat is bored…