Movie Reviews and the Killer Database Collation

If you have a core database using a different collation than the rest of the DBs around it, BAD THINGS HAPPEN.

And we’re back!  Hi folks, thanks for being patient with my December hiatus.  The holiday season is always a little hectic but this year it felt especially sudden.  And hey, you all have better things to do than read a blog in between the home cooked meals and the family gatherings.. like sleep, shop, and go see all the new movies!

Thanks to both Pitch Perfect 3 and the latest New Year’s Rockin’ Eve, Britney’s “Toxic” is now stuck in my head, so that’s fun.

britney-stewardess-toxic
I think I’m ready now… for 2018.

Some of you may not know this, but I’m a big movie nerd.  Not like the weird “knows a bunch of obscure factoids about all the Tarantino movies” or whatever.  But I do quite enjoy the behind-the-scenes / making-of stuff — what used to be called “bonus features” on DVDs (remember those things??) — whenever the wife will tolerate sitting thru them with me.

Our genre of choice is generally horror.  Now, I’m gonna get nerdy on you for a bit; because there are several sub-types or horror, and I enjoy almost almost all of them.  Campy, creepy, fun, found-footage, gory, spooky, slasher, supernatural, tense, psychological, revenge, deconstruction, possession.  For the uninitiated, “deconstruction” is like 2012’s Cabin in the Woods — it pokes fun at the tropes while building on them in unique ways.  Those are one of my favorite kind; that one in particular is definitely in my top 10 all-time.

So to kick off this year, before diving back into the technical stuff, I’d like to give you a coupe lightning reviews of some horror movies that we’ve watched that are perhaps underrated or you may have missed.

  • The Babysitter (2017) – comedy/deconstruction. A young preteen boy, whose parents are gone a lot, has a great friendship with his older teen babysitter, but one night decides to spy on what she and her friends do after he goes to bed. And well, crap hits the fan.  Lots of fun, eye candy, and slapstick violence. 👍👍
  • Patchwork (2015) – campy/revenge. 3 girls are Frankenstein’d together and have to overcome their mental differences and physical struggles to piece together the perpetrator and hopefully exact some revenge. Superbly acted by the lead lady, plenty of violence and just enough funny bits to keep it going. 👍
  • Happy Death Day (2017) – slasher/deconstruction. Think Groundhog Day but with a college chick being killed by a masked marauder repeatedly.  She must try to find out who it is before it’s too late!  Somewhat predictable but still entertaining and engaging. 👍
  • Incarnate (2016) – possession/supernatural. A somewhat unique twist on the genre, a brain doc frees people from possession by mind-sharing & getting the person back in control of their own consciousness.  Think Inception meets Exorcist.  Very well-acted, convincingly scary demon, and nicely twisted ending. 👍👍
  • Demonic (2015) – creepy/found-footage. Bit of a misnomer, as it has nothing to do with demons; it’s about a ghost-summoning gone horribly wrong resulting in the deaths of all but 1 (ish?) member of the group that originally attempted said ritual.  Frank Grillo is always on-point.  Very engaging. 👍
  • Last Shift (2014) – gory/creepy/demon-y. Rookie cop gets stuck with the last watch in a soon-to-be-shut-down police station, chaos ensues.  Literally, this is some crazy crap; scary and bloody.  Original & vastly under-hyped, has an indie vibe but looks & feels professional-grade. 👍👍

Most of these should be stream-able.  So check ’em out!

Now on to the SQL stuff.

A not equal a
borrowed from the man himself, Pinal Dave =)

Collations are Hard

If you ever have to integrate a vendor database into your existing environment, and the vendor ‘mandates’ their DB use a certain collation (which differs from the rest of your SQL instances / databases), run away screamingSrsly.

Or convince your managers that you know better, and force it into the same collation as everything else you have to integrate with.  Good luck & godspeed.

Let me give you an example.  The ERP system is being upgraded, which of course means a new (upgraded) DB as well.  Part of this upgrade seems to involve supporting case-sensitive searching/matching against name fields.  To this end, the vendor insists that the DB should use a case-sensitive collation, namely ​Latin1_General_100_CS_AS.  Problem is, the rest of your DB environment, in which a lot of stuff touches the ERP database (via joins, linked-server queries, etc.), uses the SQL default collation of SQL_Latin1_General_CP1_CI_AS.

If you follow the vendor’s mandate recommendation, guess what’s going to happen to your queries/views/stored-procedures that touch this & other DBs?  Horrible things.  Terrible performance degradation.  Wailing a gnashing of teeth from the developers, business users, and customers.

Okay, I exaggerate.  Slightly.

But it really does hurt performance, and I don’t feel like it’s talked about enough in the data professional community.  In the next post, I’ll take this problem apart a little more and try to impart some of what I’ve learned from going through the pain of dealing with the aforementioned example.

Happy 2018!

PS: Apparently this is my 50th post!!  Go me!  :o)

50-cent-face-on-50-dollar-bill
fiddy. fiddy posts.

TSQL Tuesday #96: Good Influences

This month’s invitation is brought to you by Ewald Cress (blog, twitter), who I already like based on his tagline —

finds joy in minutiae..

Yes, my friend, don’t we all.

elaine-curious-why-didnt-use-exclamation-point
I can’t spend the rest of my life coming into this stinking apartment every 10 minutes to pore of the excruciating minutiae of every single daily event!

The topic at hand is fairly non-technical, but still important: folks who have made a positive contribution to your career or professional development.  So it’s time for a shout-out!  About a year ago, I wrote about my first major career move.  There were several great influences in my first job, from the developers that taught me how to code, to the DBA who taught me how to keep cool & calm in the face of outages, to the boss who taught me the importance of time management and breadth of knowledge.

Post-mortem

Since I am way too late in posting this, and I don’t feel like waxing poetic, I’ll just say a general “thank you” to all those who’ve helped me along in my career so far, with special acknowledgement to my former boss, my current boss, the SQL family, and my own family.  Happy belated Thanksgiving and have a safe & pleasant holiday season!  I’ll have a real post again quite soon, diving back into the tech stuff.

food-coma-happy-belated-thanksgiving
hope they don’t mind me borrowing their image… =D

TSQL Tuesday 95: Big Data

This month’s party brought to you by Mr. Hammer (b|t).

mc-hammer
No, not THAT one…

I apologize in advance for all the hammertime memes.  It was just too good to pass up.  Surely he must be used to this.  Or at least not surprised by it.  =D

So, Big Data.  What is it?  Well, in simple terms, it’s the realization and acceptance of the fact that data is multi-model, multi-faceted, multi-sourced, and constantly growing.  It’s the fact that the traditional RDBMS is no longer the be-all end-all source of truth and valuable information.  It’s part of a larger ecosystem involving JSON document stores, CSV files, streaming volatile bits of data coming from random devices and user activity that loses its meaning and potential impact almost as quickly as it can be gathered and sifted and stored.

But what do we actually get out of it?  As a small-medium enterprise NOT in the software business, I have to say, not as much as the hype would have us believe.  And look, I’m not so jaded and crusty that I refuse to adapt new tech.  I Just haven’t seen a meaningful transformative business use-case for it.  Sure, we have Google Analytics telling us how our websites are doing, and someone in marketing knows something about trending our social media traffic.  Does it really help us make more money?  Heck if I know.

cease thy actions, my timepiece has indicated the necessity of mallets
Old-timey colonials can even dig it…

Here’s what I’d like to see from the thought leaders.  Give me something I can chew on — a real-world, non-hypothetical, non-frivolous, impactful use-case for adopting and implementing something like Hadoop/Spark or Azure Data Lake.  Show me how my business can realistically journey down the path of predictive analytics and what it’s going to take from our Devs, IT staff, and management to actually get there.

Because they don’t get it yet.  I have managers still worrying about how much we’re spending on a dinky little flash storage array to support the growing needs of our on-prem converged infrastructure stack.  Meanwhile the AWS bill continues to baffle, and Devs want to play with Docker and Lambda.  But we can’t seem to convince the higher-ups that they’re short-staffed on the internal-apps team, even after a minor version upgrade takes 4 hours of Ops time and half a dozen end-users doing post-mortem testing just to be sure we didn’t break anything unexpected.

I’m not here to complain.  Really.  I do want to see something amazing, something inspiring, something that shows me what Big Data truly brings to the table.  And sure, I’ve see the vendor demos; they’re all just a bit outlandish, no?  I mean, they look really cool, sure — who doesn’t want to see a chord diagram of who’s killed who is GoT? — but does that really help my business improve sales and productivity?

My point is, there’s a gap.  A chasm of misunderstanding and mis-matched expectations between what management thinks Big Data is/means, and what it takes to actually implement.  They see the pretty pictures and the fancy demos, but they don’t see the toil and sweat (or at least, in the cloud, gobs of cash) that go into building & operating the underpinnings and pipelines that drive those nice graphics.  Not to mention the fundamental issues of data quality and governance.

continue not, time for hammer it is
OK OK, last one, I swear…

So do us a favor, Big Data pundits.  Show us something real, something that “the little guy” can use to up his/her game in the market.  Something that makes a positive impact on small non-startup non-software businesses with understaffed IT & Dev teams.  But more importantly, stop glossing over the effort and resources that it takes to “do Big Data right“.  Managers and executives need to understand that it’s not magic.  And IT practitioners need to understand that it’s actually worth-while.  Because I believe you — really — that the payoff in the end is there, and is good.  But you need to convince the whole stack.


PS: I know this is a fully day late for T-SQL Tuesday, and as such, I wasn’t going to post a ping-back in the comments of the invite, but then I saw there were only 8 others, so I felt it would benefit the event if I did add my late contribution.  I’ll tweet with a modified hash-tag instead of the standard #tsql2sday, to reflect my late-ness.  Hopefully that’s a fair compromise to the community & the event’s intentions.  =)

Dirty Laundry

It’s time for a more thought-y, less tech-y post.  Which is mostly my excuse for not wanting to write a bunch of code at the moment.  But that’s how I started this blog, with mostly opinion pieces, trying to offer some critical thinking on how DBAs and Developers work together.  So y’all better like it!

Today’s title is brought to you by Don Henley’s tune of the same name, which is now stuck in my head, thankyouverymuch.

dirty laundry goes in a basket not in a database
Paint.net is my friend… =D

This is about data quality.  When you have “dirty data”, just like dirty laundry, and you let it sit unattended, it starts to smell.  In software, this means the “badness” seeps into other areas of the environment, affecting systems and business processes that should otherwise function smoothly.

code smell is a surface indication that usually corresponds to a deeper problem in the system.

-Martin Fowler

And, more aptly:

Data quality is corporate America’s dirty little secret.

-Paul Gillen

But what is dirty data?  Generally, it’s anything that doesn’t quite fit the ideal data model — that perfect vision of how all the bits of information in the system fit together, the shape of each data entity and how they relate to each other.  Mostly, dirty data is what happens when you allow users to type things into text-boxes, and you write those text-box contents straight into the database without any layers of validation or cleansing.  (Coincidentally, that’s also how SQL injection happens, but most of us have been scared-straight by enough years of security bloggers hammering at our thick skulls — and our favorite XKCD — that we at least sanitize our inputs before dumping them to an INSERT statement.)

Let me take a recent example from my experience.  We have an ERP system that doubles as our CRM system (which is already a pair of bad idea jeans).  How do you think customer information gets into the database?  Customer Service Reps, typing stuff.  Usually by copying from a paper form.  Or the customers themselves, using an online form.  But guess what doesn’t happen in either case?  If you said “USPS address validation“, give yourself a hand!

joker give yourself a clap
Oh goooood for youuuuuu…. </Christian Bale>

Now, being that this system is our “source of truth” for customer info, it stands to reason that lots of other business functions & processes depend on it.  For example, let’s say we send a promotional calendar to our customers of a certain “subscription level” on a yearly basis.  We’re not in the publishing business, so we contract this out to another company.  But guess what they need from us in order to complete the job and mail out those calendars?  Addresses!  So what happens when there’s a bad address in our database?  A calendar gets returned, wasted cost and materials.  Multiply that by a couple thousand and you start to turn a few heads in the C-suite.

Later, around the Marketing table, someone has a brilliant idea that they need to run a mail-merge to send out a gift-package to the top 100 customers.  So they ask the DBA for a list of said customers.  “Sure!  Here ya go, here’s a report.”  And then the complaints start coming in.

“These customers aren’t active anymore.”

Then tell your CS reps to mark them as inactive in the system.  But no, we don’t do that, we just write “inactive” in the FirstName field.

“These ones are employees.”

Fine, figure out a special indicator to add for that, so I can exclude them from the report.  But no, of course, we can’t do that either; we just put “deactivated” in the FirstName field.

“This guys is dead.”

Yeah, not even kidding.  Apparently the powers-that-be decided to keep his info in the system, but type in “deceased” to the “Address 2” line (in the US, this is customarily the apartment/suite/unit number).

he's dead jim
Let’s beam him back up but write “deceased” on his badge, that’ll be sufficient.

But mostly, the biggest complaint is that we’re getting un-deliverable/return-to-sender when we try shipping out to some of these addresses.  And why?  Because they’re not subject to any external validation and quality-control.

So what’s the data professional’s responsibility in this?  In my opinion, it’s to advocate for data quality.  There are obviously big vendors out there like Melissa Data who will sell you a service to help get you there.  APIs abound, from USPS and other official sources, so building it isn’t out of the question.

One potential roadblock is, as usual, conservatism.  The business’s ERP system is its life-blood, highly sensitive to change and very guarded by over-protective management and finicky executives.  But the smelly dirty data-laundry continues to cause problems and has real-money impacts on corp. efficiency and profit.  Unfortunately, many people tend to take the ostrich approach.

if you bury your head in the sand your ass will get burnt
No idea who this Bennett person is, but they sound smart.

So, my good people, start “doing your laundry”.  Have those conversations with your teams and managers about the current state of your data quality, and what it’s going to look like moving forward.  Make some plans, have a road-map, and understand that it’s going to involve a lot of collaboration between key players.  And good luck!

Quickie: TempDB on local SSD

What could possibly go wrong? As it turns out, plenty.

Faithful reader(s), it’s been a while!  I’ve been busy preparing for some big transitions.  I’m also getting better at MDX queries, tweaking SSAS-based reports to more accurately reflect the business rules.  But enough about that, on with the post!

In which we doubt the SAN

A storage area network (SAN) is a management & administration solution, not a performance solution.

-someone wiser than me

SANs are wonderful technology.  They inspire all kinds of geekery and are purported to solve all your storage woes.  But there’s a catch: they’re expensive.  Not just as a capital expense, but in maintenance and licensing costs.  And if you ever want to upgrade it, like add some more drives to a particular tier/pool — fuhgeddaboudit.

So what do we do with SQL on a SAN?  Well, it has tiers, right?  Slower storage with huge capacity, faster storage with less, etc.  We put the data files (heavy random read workload, typically) on the pool optimized for that kind of I/O pattern.  We put the TLog files (heavy sequential write workload) on the pool best suited for that.  And what about good ol’ TempDB?  Its access pattern is fairly unique — random writes and reads, and frequent overwrites, which means it could potentially wear out your typical prosumer SSD relatively quickly.  But we’re not complete cheapskates, we’ll buy enterprise class SSDs, no?

So we go read some stuff and figure, hey, sounds like a great idea, right?  Put TempDB on a local SSD, or better yet, a pair of SSDs in RAID-0 for pure performance (because this is a cluster, we’ve got HA already).  We’ll reduce the load on the SAN I/O channels and make our overworked TempDB happier with lower latency and better throughput.  Right?

what could possibly go wrong
ooh, sparkly!

 

In which we discover what could possibly go wrong.

Once the new drive(s) is(are) installed and “presented” to Windows (that’s my SysAdmin’s term), it’s fairly trivial to do the SQL configuration change — it does of course require a SQL service restart (or cluster failover).  Code example, assuming your new drive is ‘T’:

use master;
alter database tempdb
modify file (name=tempdev, filename='T:\tempdb.mdf')
alter database tempdb
modify file (name=tempdb2, filename='T:\tempdb2.ndf')
--etc...

You do of course have multiple TempDB data files, yes?  Good.

Side-debate

Should we put templog (TempDB’s transaction log) on the same drive as the TempDB data files, or put it on the same storage pool as the regular DBs’ TLogs?  As usual, “it depends” — ask your favorite SQL gurus and do some testing.

Back on topic

We’ve made the change, we’ve done the cluster failover.  TempDB is now running on our spankin’ new SSD.  So we start monitoring performance metrics.  Things like file I/O stats (from SQL DMV sys.dm_io_virtual_file_stats), latency and waits (from our monitoring tools), and good ol’ PerfMon.

But wait, what’s this?  I/O stalls are higher?  Write latency is higher?!?  Perfmon agrees?

nooooooooooooo-vader
why god why?!?

Write latency on the TempDB files was over 10x higher than it was when they were on the SAN (the performance tier, to be clear).  The file_stats DMV showed large increases in I/O stalls.  Sad-trombone.

 

In which we have several theories

Then ensued various conversations and brainstorms among my colleagues.

Someone check the firmware/drivers!

It’s up-to-date.

Maybe it’s got the wrong block-size.

Nope, 64k.

Well, it’s only 6Gbps SAS… maybe we should’ve sprung for the 12Gbps.

The write latencies went up by a factor of 10.  I don’t think an improvement by a factor of 2 is going to win you any trophies.

Why didn’t we get an NVMe or M.2 one?

Because the damn blades don’t have those slots, goober.

Another interesting observation, and potentially the silver lining.  Overall instance waits (wait stats), according to our monitoring tool, went down.  That’s good news, right?  Maybe.  Does application performance & user experience corroborate it?  Possibly!  We’ll be observing the patient for another week or so.

Let’s turn to the community again to see what others have experience.

In which we eat some crow

And by “we” I mean “me”.  Being the DBA and the primary proponent of the SSD addition, because I knew our workloads were very TempDB-heavy, I had to hang-tail and admit that the SAN gods won this round.

Maybe.

But wait, what about the fact that our wait stats are down?  What about app/user experience?  Valid arguments, I agree.  That’s why we’re still observing.  But I’m not optimistic, given the follow-up links above.  We may utilize local SSDs for something else (index filegroups?) — but if those write latencies don’t improve, I’m concerned that it won’t help anybody.

keep calm because only time will tell
And beer. Lots of beer.

In which I ask for your help

Yes, you!  If you have ideas on what we did wrong, what we’re missing, or any other advice about getting the most “bang for the buck” out of a direct attached SSD on a converged-infrastructure Cisco UCS blade server platform with a VNX SAN, by all means, drop me a line.  I’m all ears.

TSQL Tuesday 93: Interviews

This month‘s event is hosted by the fabulous DBA/SQL-consultant (& part time cartoonist) Kendra Little! Go check out her blog, podcast, and training at sqlworkbooks.com – really great stuff.

DBA interviews are tricky.

The problem isn’t so much that the role is vaguely defined. Although, depending on the size of the IT org and the tech stack, it can vary widely from a jack-of-all (DB Dev, report writer, production ops, the works) to a highly specialized performance tuner who works with 7 other teammates, each of whom has a unique surgical specialty in the data platform. But that’s not the problem — well, not the main problem. It is a problem in the sense that the business folks, especially HR, are notoriously and astonishingly ignorant of what a DBA or related role actually involves. But you get past that once you start talking to the tech leads and IT directors.

No, the problem is that, like most higher level technical roles, you don’t really know how a candidate is going to function in it (the role) without actually seeing him or her.. IN it. Do they keep a cool head when production goes down? Do they have a solid plan of attack for the dreaded “everything is slow!” complaint-storm? Do they have a good handle on HA & DR architecture & implementation? Can you rely on them to actually practice and follow thru with those strategies? Can they be a continuous learner and keep abreast of new developments while still tempering that with wisdom & maturity, applying the correct tools to the proper problems? Do try add value to the team and organization by both teaching and learning from others?

These are truly difficult, complex questions that are nearly impossible to deeply assess and fully answer during an interview process. Largely because the only real evidence of their answers lies in actual experience. Sure, a cert here or an MVP there definitely helps your case. But at any rate, we try our best to chip away at the boulder.

aerosmith chip away at the stone record label
it was a record. from the 70s. it’s the best i could come up with. =P

Pivoting to a more positive note, I’ll share some of the better questions that I’ve experienced during my career so far.

Some good examples.

How would you design and build a data copy/sync process across/between tiered environments, say DEV-QA-PROD?

Really great question.  This is a common problem is small-to-medium enterprises with legacy systems where DevOps hasn’t quite reached down to the depths of the internal application stacks and people are still dealing with “refresh cycles” on the order of months, quarters, or even years.  You can approach it purely from a tooling perspective, but that’s not the whole picture.  Thus, it calls for some thought and team-culture ideas as well as “knowing the nerd-knobs”.

We have a complex process flow that involves a lot of stored procedures, say 50 total.  Some of these are non-sequential, meaning they can be executed in arbitrary order, while others need to be sequenced with each other in “blocks”.  This is a vendor product, so ultimately, the customer gets to decide the schedule and order of execution of this flow.  The solution needs to be maintainable by field engineers.  How would you handle this?

Woah.  Talk about diving down a rabbit-hole.  This is interesting in the sense that it exposes a bit of the architecture and some of the potential pain-points that the team is hoping to solve, while leaving enough room for improvement and experimentation by the hopeful candidate.  More to the point, it’s just an example of a more general technique, which to me is very effective: taking an architectural problem that actually comes from the “real world” (the company/team that’s interviewing) and asking for the candidate’s ideas on how to solve it.  You don’t need to get in-the-weeds super-detailed about it, but outlining your ideas helps indicate how you think about complex challenges and shows what kind of value-add you would bring to the team.

And finally, a perennial favorite:

Tell me about a time you broke production, and more importantly, how you addressed and resolved it.

So many stories from the trenches involve downtime and mistakes, it’s good to ‘bond’ over them.  It helps bring the egos back down to earth, and reminds us that we’re all just meatbags, making technology to do our bidding, occasionally to our own regret.  It shows the candidate’s “pressure cooker” mentality, or at least, what they tell you about it.

server did what you instructed it to...you don't say?
Don’t ya hate it when…

In conclusion.

If you’re a DBA, Dev, or IT pro, help your managers better understand your team’s needs when it comes to hiring.  Get involved in the job description write-ups and screening process questionnaires.  Barge your way into those ivory towers, if you have to — or you’ll regret the time you waste on candidates who really belong in a different role than the one you’re after.

If you’re a manager, PLEASE LISTEN to your reports and tech leads.  They know what makes a good team member, they’ve been doing it for a long time.  Don’t dismiss their advice or block them from being part of the hiring process — yes, they are busy, and yes, they can be crotchety, but their input is highly valuable toward bringing in effective & productive talent.

That’s all folks!

PS: I know I missed the “deadline” by about an hour..ish.  I blame DST.  Heck, it’s still Tuesday for the majority of the Western hemisphere.  I’m not biased, but I write in English, so… ya know.  Take it as you will.  Now excuse me while I go hide from the blog-police in my ASCII-bunker.

*\ |* ___________
\| ___/ I___I___I_\___
/___I___I___I___I___\ Incoming!!
/__I___I___I___I___I__\ /
___ /_I___I___I___I___I___I_\ ___
/ \ __ /____I___/ \___I____\ __ / \
/ \ / \ _ /___I____/ O \____I___\ _ / \ / \
/ \_/ \_/ \__/__I___I__|____^____|__I___I__\__/ \_/ \_/ \

view raw
AsciiBunker.txt
hosted with ❤ by GitHub

Automating SQL Installation

..while there are likely even better ways to do this in the long-run, this quick & easy approach was sufficient to save me time and effort..

At this point in my career, I’m not managing massive environments, so I don’t feel the need (nor have the expertise) to use a large scale solution like DSC or SCCM.  But I’ve had to install SQL Server a few times, so I figured it’s worth at least scripting out a standard pre-configured installation, so that A) I don’t need click through a GUI ‘wizard’ hearkening back to the ’90s, and B) the SysAdmins can “fire and forget” (read: stop bugging me about it).

keep it simple stupid
the patented one-eyebrow-raise..

The Disclaimer

Thus, I’m attempting to K.I.S.S., while making it configurable & repeatable.  There are some limitations of this approach, as alluded above.  It’s not “massively scalable” (scaleable? scale-able?) because:

  1. The PoSh script still needs to be deployed locally to the server in question
  2. The installer config (.ini) also lives locally (though it probably could be a UNC path, it’s just a file after all)
  3. The script prompts you for the service account (SQL engine, Agent) credentials and the sa password using the Read-Host -AsSecureString method cmdlet, so some meatbag still has to type those in.  This is because we don’t have an enterprise pwd/secret-management system where I could, say, ask it for a service account credential set and tell it to embed that securely in a script without it actually being visible to me.  So, while yes, they’re kept in a “vault”, it’s not query-able by anything else, so an admin still needs to copy & paste them into whatever configuration screen he’s working with at the time.  Not ideal, I know, but we work with what we’ve got.

PS:  Yeah, yeah, “don’t use sa, rename it or disable it; or use Windows Auth only!”.  Rage, howl, fire & brimstone.  I’m not going to argue about it; we can save that for another post.  This environment dictates that its used during setup and then disabled later, so that’s beyond the scope of the installer config.

So yes, while there are likely even better ways to do this in the long-run, this quick & easy approach was sufficient to save me time and effort for the occasions when a new SQL box/VM needs to be spun-up.

Useful links

  1. A primer on SQL cmd-prompt installation & its arguments
  2. A couple community articles on the subject (the latter about slipstreaming updates)
  3. A technet article & couple Q&A threads (technet, stackoverflow) that helped me figure out how to securely get & put the credentials
  4. An example for mounting an ISO in PowerShell
  5. And finally, two things that I attempted to understand but ultimately failed to implement, because (apparently, at least to me), PowerShell remote-ing is a P.I.T.A.
config.ini, to command prompt, to PowerShell
3 steps toward a better workflow

The Outline

First we need an .ini file to work with.  You could either create it from scratch, or take it from an existing SQL box’s “Setup Bootstrap” folder.  Example path C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\Log\20170801_073414\ConfigurationFile.ini​  — indicating this was an install done on 8/1/2017 at 7:34am.  Right above that, at simply C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\Log\, you’ll see a Summary.txt file, which can actually come in handy while you’re testing these unattended installs and wanting to see why it failed.

The first link above, from MSFT Docs, does a pretty nice job of telling you all the things that make up this config file.  You get to bypass the TOS prompt, enter service account details, specify drive letters (paths) for default data/log file locations & tempdb, slipstream update packages (UpdateSource​), and even more advanced stuff like AG settings and whatnot.  My example will be a simple standalone instance using the default name, so I’ll be sticking with the basics.

We can then use this file in the ConfigurationFile argument of setup.exe from the SQL Server install media.  To put a little more color on that: the .ini file is really just a collection of command-line arguments to setup.exe​; you could also list them all out in-line, but that would be tedious and silly.  Here’s a couple major selling points of creating your own config file:

  1. Slipstream updates (SP’s, CU’s), instead of having it go out to MSFT update servers (or *aghast* sticking with the original RTM bits, you heathen you!)
  2. Specify drive letters / default file locations: sure, this may be considered old-hat if you’re running super slick storage, but I still find it makes management a bit easier if I know where my MDFs, LDFs, TempDB, & backups will always be.
  3. Take advantage of 2016’s better TempDB setup options (# files, size & growth)

We will, however, keep a couple arguments out of the .ini file and instead throw them into the ArgumentList from the calling PowerShell script.  Speaking of, here’s what the PowerShell script needs to do:

  1. Prompt the operator (SysAdmin or DBA) for the SQL & Agent service account credentials, and (optionally) the sa pwd (if using it).
  2. Fetch our install media from the central network share where we store such things (server & office ISO​s, for example).
  3. Mount said ISO to our virtual disc drive.
  4. Run its setup.exe with the following arguments:
    1. The config .ini file
    2. The service & sa accounts
  5. After it’s done, un-mount (dismount) the ISO.

Then the DBA can connect to the brand-spankin’-new running SQL instance and do other post-setup configurations as desired (i.e. set max-memory, maxDOP/CTFP, etc).  And sure, those could also be done in PowerShell (thanks in no small part to the awesome team at DbaTools), I chose not to do so in this case.

As the bloggers say, “that’s left as an exercise to the reader”.

Plus, they’re never quite as deterministic as we’d like them to be — they depend on the server’s compute resources, i.e. memory size & CPU cores, as well as estimated workload & environment tier, so it’s often a gamble in “how correct” your initial settings will be anyway.  Still, anything is better than the defaults, so configure-away!

husky puppies sharing
because sharing is caring!

The Code

Here are the Gists I’ve created to go along with this post.  If I’ve made a mistake, or if you, dear reader, have a suggestion, we can incorporate them into the gist without me having to go back and edit the blog post!

Yay technology!

I’d love to get feedback on how you would improve this, what you might do differently, etc.  Drop me a comment or a tweet!

Config/INI file:

; AUTHOR: Nate Johnson, @njohnson9402/@natethedba, natethedba.wordpress.com
; LICENSE: https://choosealicense.com/licenses/unlicense/
; TYPE: SQL Server 2016 Configuration File (for command-line installation)
; DESCRIPTION:
; Inline comments are mostly copied from existing file generated by install wizard, with clarification where necessary.
; I use drive D:\ for data files (MDF), L:\ for transaction logs (LDF), T:\ for TempDB, and X:\ for backups.
; Most other options are "normal", i.e. I don't deal with clustering, Availability Groups, or other exotic things.
; I am only installing the database engine and replication components; see FEATURES option for more.
; Read the corresponding blog post at https://natethedba.wordpress.com/automating-sql-installation/
; USAGE:
; There are THREE static placeholders you need to replace/type when you use this:
; 1. <YOUR UPDATES/PATCHES FOLDER>, which is for slipstreaming whatever service-pack(s) and cumulative-update(s).
; 2-3. <YOUR DOMAIN> and <YOUR DBA GROUP>: for specifying domain account(s) which are given SQL sysadmin ('sa') rights,
; hence, usually your DBA group. You could move that part out to the calling script instead, if you wanted,
; but I chose to leave it here because it's not likely to change much within an environment.
; required first line
[OPTIONS]
; Specifies a Setup work flow, like INSTALL, UNINSTALL, or UPGRADE. This is a required parameter.
ACTION="Install"
; Specifies that SQL Server Setup should not display the privacy statement when ran from the command line.
SUPPRESSPRIVACYSTATEMENTNOTICE="True"
; By specifying this parameter and accepting Microsoft SQL Server terms, you acknowledge that you have read and understood the terms of use.
IACCEPTSQLSERVERLICENSETERMS="True"
; Use the /ENU parameter to install the English version of SQL Server on your localized Windows operating system.
ENU="True"
; Setup will run silently, logging to files in the setup-boostrap directory
Q="True"
; Specify whether SQL Server Setup should discover and include product updates. The valid values are True and False or 1 and 0. By default SQL Server Setup will include updates that are found.
UpdateEnabled="True"
; If this parameter is provided, then this computer will use Microsoft Update to check for updates.
USEMICROSOFTUPDATE="False"
; Specifies features to install, uninstall, or upgrade. The list of top-level features include SQL, AS, RS, IS, MDS, and Tools. The SQL feature will install the Database Engine, Replication, Full-Text, and Data Quality Services (DQS) server. The Tools feature will install shared components.
FEATURES=SQLENGINE,REPLICATION
; Specify the location where SQL Server Setup will obtain product updates. The valid values are "MU" to search Microsoft Update, a valid folder path, a relative path such as .\MyUpdates or a UNC share. By default SQL Server Setup will search Microsoft Update or a Windows Update service through the Window Server Update Services.
UpdateSource="<YOUR UPDATES/PATCHES FOLDER>"
; Displays the command line parameters usage
HELP="False"
; Specifies that Setup should install into WOW64. This command line argument is not supported on an IA64 or a 32-bit system.
X86="False"
; Specify a default or named instance. MSSQLSERVER is the default instance for non-Express editions and SQLExpress for Express editions. This parameter is required when installing the SQL Server Database Engine (SQL), Analysis Services (AS), or Reporting Services (RS).
INSTANCENAME="MSSQLSERVER"
; Specify the root installation directory for shared components. This directory remains unchanged after shared components are already installed.
INSTALLSHAREDDIR="C:\Program Files\Microsoft SQL Server"
; Specify the root installation directory for the WOW64 shared components. This directory remains unchanged after WOW64 shared components are already installed.
INSTALLSHAREDWOWDIR="C:\Program Files (x86)\Microsoft SQL Server"
; Specify the installation directory.
INSTANCEDIR="C:\Program Files\Microsoft SQL Server"
; SQL Agent account startup type
AGTSVCSTARTUPTYPE="Automatic"
; Level to enable FILESTREAM feature at (0, 1, 2 or 3).
FILESTREAMLEVEL="0"
; Specifies a Windows collation or an SQL collation to use for the Database Engine.
SQLCOLLATION="SQL_Latin1_General_CP1_CI_AS"
; Set to "True" to enable instant file initialization for SQL Server service. If enabled, Setup will grant Perform Volume Maintenance Task privilege to the Database Engine Service SID. This may lead to information disclosure as it could allow deleted content to be accessed by an unauthorized principal.
SQLSVCINSTANTFILEINIT="True"
; Windows account(s) to provision as SQL Server system administrators.
SQLSYSADMINACCOUNTS="<YOUR DOMAIN>\<YOUR DBA GROUP>"
; The default is Windows Authentication. Use "SQL" for Mixed Mode Authentication.
SECURITYMODE="SQL"
; The number of Database Engine TempDB files.
SQLTEMPDBFILECOUNT="4"
; Specifies the initial size of a Database Engine TempDB data file in MB.
SQLTEMPDBFILESIZE="1024"
; Specifies the automatic growth increment of each Database Engine TempDB data file in MB.
SQLTEMPDBFILEGROWTH="64"
; Specifies the initial size of the Database Engine TempDB log file in MB.
SQLTEMPDBLOGFILESIZE="1024"
; Specifies the automatic growth increment of the Database Engine TempDB log file in MB.
SQLTEMPDBLOGFILEGROWTH="64"
; The Database Engine root data directory.
INSTALLSQLDATADIR="D:"
; Default directory for the Database Engine backup files.
SQLBACKUPDIR="X:"
; Default directory for the Database Engine user databases.
SQLUSERDBDIR="D:\Data"
; Default directory for the Database Engine user database log files.
SQLUSERDBLOGDIR="L:\Log"
; Directories for Database Engine TempDB files.
SQLTEMPDBDIR="T:\TempDB"
; Directories for Database Engine TempDB log files.
SQLTEMPDBLOGDIR="T:\TempDB"
; PS: yes, there is some debate as to whether your TempDB logs should go with your user DB logs, or with your TempDB data;
; I'm inclined to say the latter, because I like to use locally attached NVMe/M.2 flash storage specifically for it. But, YMMV.

PowerShell install script:

# AUTHOR: Nate Johnson, @njohnson9402/@natethedba, natethedba.wordpress.com
# LICENSE: https://choosealicense.com/licenses/unlicense/
# TYPE: PowerShell script
# DESCRIPTION/USAGE:
# There are TWO static placeholders that you need to change/type-in when you want to use this,
# they both start with <PATH TO …>. I also added a #CHANGE THIS! comment to the end of their lines.
# The first one is for your installer config .ini file
# (see other gist at https://gist.github.com/NJohnson9402/a3c13429a055771efd26eefa66c69d62).
# The second is for the location of your SQL server installation media (ISO).
# See corresponding blog post at https://natethedba.wordpress.com/automating-sql-installation/
# get account credentials to configure SQL
$sqlsvc = Get-Credential Message "SQL Service account?"
$agtsvc = Get-Credential Message "SQL Agent account?"
$sapwd = $(Read-Host Prompt "SQL 'sa' login pwd" AsSecureString)
# multi-line string concatenation
$arglist = '/ConfigurationFile="<PATH TO your config file>.ini"' ` #CHANGE THIS!
+ ' /AGTSVCACCOUNT="' + $agtsvc.UserName + '"' `
+ ' /AGTSVCPASSWORD="' + $agtsvc.GetNetworkCredential().Password + '"' `
+ ' /SQLSVCACCOUNT="' + $sqlsvc.UserName + '"' `
+ ' /SQLSVCPASSWORD="' + $sqlsvc.GetNetworkCredential().Password + '"' `
+ ' /SAPWD="' + [System.Runtime.InteropServices.Marshal]::PtrToStringAuto([System.Runtime.InteropServices.Marshal]::SecureStringToBSTR($sapwd)) + '"'
# this last crazy tidbit converts the SecureString to a plain string that is still "secure"
# in the sense that it can't be sniffed or dumped from memory by another process. At least, that's my understanding.
# mount the install image
$iso = Get-ChildItem Path "\\<PATH TO your SQL Server 2016 Standard x64 iso folder>\" #CHANGE THIS!
Mount-DiskImage $iso.FullName
# get the drive letter of the mounted image to reference setup.exe
$setup = $(Get-DiskImage ImagePath $iso.FullName | Get-Volume).DriveLetter + ":\setup.exe"
# run installer with arg-list built above, including config file and service/SA accounts
Start-Process Verb runas FilePath $setup ArgumentList $arglist Wait
# un-mount the install image when done after waiting 1 second (just for kicks)
Start-Sleep Seconds 1
Dismount-DiskImage $iso.FullName

Know your Role

DBA does not mean “database archaeologist”, even though sometimes that’s what you end up doing…

Aka “Dammit Jim, I’m a DBA, not a data researcher!”  Or, as I stated on Twitter:

DBA != Database Archaeologist

penguin archaeologist
because… SQL ❤ Linux!

Today I spent more hours than I care to admit, tracking down some obscure data from two disparate systems in an attempt to reconcile what were supposed to be matching records.  Part of that is my own fault — I’m a sucker for interesting problems and edge cases, as I’ve blogged about before…

mostly just for the smug satisfaction of proving to the business that “your assumptions about how your data works are invalid“.

But mostly it’s because, the further back in time you go, the less reliable the data becomes.  Especially (exponentially) when that data originates from human free-form text input.

mr garrison manual inputs are bad
but GUI’s are OK.. and CLI’s are great!

Let’s contrive an example.  We have our core business product system, WidgetMaster, which tracks Widgets we ship out by WidgetNumber.  Our partner associate business runs an online widget exchange where people can buy and sell their Widgets in a sort of second-hand/after-market fashion.  PartnerExchange listings are supposed to include the WidgetNumber for ease of tracking and associating data between the two systems, even though they’re officially run by different companies (or in my case, different departments of the same company — yeah, think about that for a second).

Now, ideally, theoretically, those WidgetNumbers should always match up. But unfortunately, up until late 2014, our WidgetMaster system didn’t have an API that PartnerExchange could call to obtain a widget by its number; and even if we did, they have to at some level rely on the customer (or a worker) to read and enter that WidgetNumber into the exchange listing.  But wait, we started doing bar-codes back in 2010, so “most” of them are actually scanned from the bar-code, but not every customer has that capability, so there’s still a lot of hand entered data.

So we have some dirty data.  Let’s complicate things a bit. Over time, those widgets can come back to WidgetMaster for update/upgrade and then ship back out.  Again, ​WidgetNumber should remain consistent throughout that process.  Now, when PartnerExchange sells certain particular widgets, sometimes they’re part of a SuperSpecialCollection.  This collection spans many years, maybe even a decade or more. WidgetMaster got wind of this SuperSpecialCollection, being bought-up by Mr. HighRollerCustomer, so we started marking the incoming/outgoing records with a new property.

But it’s text.

It’s entered by the receiver, based on looking at the Widget’s buy/sell history in PartnerExchange.  And yes, the HighRollerCustomer who last bought the widget is aware that it’s part of their SuperSpecialCollection, but they aren’t guaranteed to specify that when they send the widget back in to WidgetMaster for upgrade.

Do we see the problem yet?

oh it gets better (again)
Yes, yes it does!

See, about 5 years ago, there was a reorg, and the dev team for WidgetMaster completely revamped the way in which “collection membership” for incoming widgets is designated/tracked. So now it’s over in some property table. To make matters worse, PartnerExchange renamed SuperSpecialCollection to AwesomeCltn a few years ago because they were tired of typing so many letters (and apparently fans of cryptic abbreviations).

Fortunately, PartnerExchange has done a decent job of at least storing the correct WidgetType and WidgetQuality in their listings, despite WidgetNumbers being fairly sparse.  But again, because over in WidgetMaster, we’re supposed to associate each WidgetNumber with the AwesomeCollection, we now have this secondary task of mapping unmatched WidgetNumbers across systems, by using Type and Quality from one side (partner) combined with Collection-membership from the other side (master), by assuming that the partner’s designation of SuperSpecial/AwesomeCollection is correct.

If your head’s not spinning yet, give yourself a round of applause. While rubbing your tummy and tapping your foot.

Needless to say, this is hard.  We’ll probably get the majority of records matched (mapped?) eventually by using a couple string LIKE predicates and some clever try/pass/retry flow, but it’s tedious at best.  Another bit of frustration will come up when we do a couple ad-hoc searches thru each system to attempt to apply reason and logic, i.e. find a pattern; and because we’ve already done the work, we might as well put that info into our results, even if it doesn’t show us a useful pattern by itself.

So how do we approach this?  We’ll as I said, I spent what I felt was too much time on it, but essentially I did an initial “majority rules” mapping attempt (first pass), followed by a few reconciliation attempts for the remainders.  Those consisted of fairly identifiable patterns with a couple outliers. With those outliers, as with the rest of the unmapped records at the end of the day, I had to tell the business, basically, “Here’s the majority of the results. You can assign a research specialist or analyst to the rest, if you feel it’s that important.”

I may have done this with slightly more attitude than necessary.

How could we improve this?  The bigger geeks in the room may pipe up with “machine learning!”  Ok sparky, do you have that infrastructure ready to go?  No?  How long for implementation?  mumble mumble something about Azure mumble…  Okay, sure, how about training the model so you can feed it your data?  Cool, well enjoy the incredulous laugh you’ll get when you tell the manager that.

How about other tool sets? Sure,  we could check out Python or R, write a C# app maybe?  Well guess what, we still need to look at the data to understand what patterns (or lack thereof) there are to work with.  And again, lead time.  Unfamiliar tools means longer development cycles.  And they’re really not guaranteed to produce any better results (more matches) at the end of the day, are they?

Because your data models and your analyses are only as good as the data itself.

data is the iceberg
Or is it “data are“?

And with that, I’ll call it a day.  Thanks for reading!

Clean code, the SQL: Part 2: Electric Boogaloo

I enjoyed the conversation, because it really got us both thinking more deeply about which areas of our app landscape are in better/worse shape than others.

After the previous discussion about nested views, encapsulation & abstraction, I’d like to write about duplication specifically, and the distinction between actual lines of code being duplicated, versus functional duplication. Because the latter is not OK, but the former is generally acceptable when it’s boilerplate code, or done, again, in the name of performance and efficiency.

boilerplate with alphabet and stuff
letters and numbers and symbols!

 

So, to expand on last week’s “Encapsulation & Abstraction” segment.  The conversation with one of my favorite developers went something like this.

Developer:

While I agree that there’s some over-reliance on nested views, the reason they get implemented a lot is because there’s a particular problem they seem to easily solve: how to encapsulate business-data rules without violating DRY.

Let’s say we have a biz-rule for a “core segment” of data.  For simplicity’s sake, let’s call that rule “Widget A consists of a Widget record and a WidgetSupplement record joined by WidgetID, where Widget.WidgetType is ‘foo’.”  So it seems obvious to create a view WidgetFooComplete, which pulls in the data from both tables and applies the type condition.  This creates a sort of “atomic building block” of data, which can be consumed by apps & data-access methods repeatedly & consistently.

Now, most downstream apps will use that WidgetFooComplete data in its entirety (or nearly).  But let’s say there’s a hot new app that needs some more data about the Widgets, and it needs to go out to the WidgetMoarProperties table.  The natural inclination is to incorporate our existing “building block” view, WidgetFooComplete, into a new view for this app & its dependencies, and call it WidgetFooMoarComplete.

But what’s the alternative?  If we re-create the JOIN/WHERE conditions on the base-tables in this new view, it violates DRY and makes possible future refactoring difficult if that biz-rule changes.

Admittedly, most modern data-access technologies make it easier to create these “building blocks” of joined data entities.  And sometimes those biz-rules belong in the app’s lower layers, but this can lead to writing lots of little disparate queries/db-calls for what should have been one atomic operation.  That can be a maintenance headache, as could dozens (hundreds) of tailored stored-procs for every data-access scenario.

So it seems like nested views can have their place, but “deep” nesting is usually troublesome.  And to prevent the “slippery slope” effect, we have to practice diligence.

Me:

That’s pretty spot-on.  DBAs tend to criticize them (nested views) as a practice in general because of the tendency to over-use and over-rely on them, and because of that slippery slope, where “a little” use turns into “a lot”, and leads to troubleshooting headaches.  And generalizations are just that.

To take some examples in-hand: simple entity relationships, especially when biz-critical, should be A) obvious, and B) documented.  Unified views can serve this purpose, but should only be used where appropriate — i.e. to load an object that gets passed around/up the app stack.  They’re great “atomic building blocks” when you actually need the entire block of data.  But when you don’t — say in a stored-proc that’s doing some data flow operation and only needs a small subset of that data block — it’s probably better to get the relationship logic from the view and copy-paste it (but hopefully not all of it!), while omitting the stuff that’s not needed.

The main reason for this is usually index tuning.  If we’ve crafted some indexes to meet certain query patterns in certain troublesome procs, we want those procs to use those indexes, not just do a full table scan because they’re using a nested-view which does select * .

When we get to more complex business rules, we need to up our diligence game and be more mindful of dependency checking when planning for a rule change.  Proc comment-headers can be helpful here, as can tools that search thru SQL object meta-data and code-bases to produce dependency chains.

The main point is, duplication tends to be OK when it’s not functional duplication, i.e. when the SQL code is more-or-less similar in some places but it’s not exactly the same because the purpose (responsibility) of that module/stored-proc is not the same.

You’re right in that the “31-flavors of tailored procs for data-access” is a big maintenance headache, and sometimes that trumps even the performance concerns.  Again it’s about balance — we have to be mindful of both the biz-rule-maintenance concerns and the performance concerns.

Developer:

I figured.  Sometimes I see DBAs criticize developers’ work without seeming to understand that it doesn’t always come from sloppiness or laziness (although sometimes it does!).  Often, we’re trying to thread that needle of performance vs. maintainability.  In Dev-land, “lazy” is good in the sense of aiming for simplified logic, for ease of both maintenance and understanding.  Painstakingly tailoring each data-access call (stored-proc), while good for performance, is kinda opposite of that.  But, admittedly, we do fall back on SELECT * all too easily.

Mostly, we try to avoid code duplication because it leads to heavier maintenance overhead.  When some modules may perform similar operations, functionally, they will often re-use the same “core” logic, which we in turn encapsulate into its own ‘thing’.  But in SQL modules, as you say, that’s not always performant, so it’s definitely a tightrope-walk.

The “Clean Code” school of thought says, if it’s obvious, it’s “self-documenting”.  I don’t always agree with it, but it comes from maintenance concerns again.  We don’t like situations where someone tweaks the code but doesn’t update the comments, and you end up with misleading comments.  Unfortunately, it does come down to diligence again, and even “good” developers will easily fall back to rarely including comments just to avoid this situation.  Of course, another potential pitfall of supposedly self-documenting code is, what’s “obvious” to one person isn’t necessarily so to everyone else!

(We both enjoy writing, can you tell?)  =P

So basically we agreed to “moderation in all things” and exchanged Buddha statues and sang Kum-Bay-Yah.  I enjoyed the exchange because it really got us both thinking more deeply about which areas of our business/app landscape are in better/worse shape than others.

developer-dba-harmony-buddha
yay collaboration!

To conclude this part.  You will continue to see DBAs rant and rail against nested views and other “sins against SQL”, but:  Developers, don’t take it personally — we’re just trying to eek the most performance-per-$3k-core-license out of our precious servers, and spend less time chasing the white rabbit down the nested-views-hole.  And DBAs, go easy on your Devs — they still outnumber you, and they can whip out a complete web-app using the hottest JavaScript framework and a cloud-of-the-month service, faster than you can tune a server.  Everybody’s valuable, and everybody works toward the same goal: solving the business’s problems thru technology.

Moving on…

Part 3: Misusing & Abusing Datatypes

Because I’m getting long-winded again, let’s wrap up with a final “Clean SQL Code” topic that’s short & sweet.

Well, not really.  There are entire presentations dedicated to this topic.  But I’ll try to keep it condensed.

A date is not a datetime is not a time​ is not a time interval.  Okay?  For the third time, stop interchanging them!  Yes I know, SQL Server is a bit behind some other RDBMS platforms when it comes to this stuff.  Sorry, I don’t work for Microsoft.  I just deal with their tech.

Deep breaths…

More to the point, know your data.  Understand that there can be consequences to repeatedly casting types, or losing precision during conversion, sometimes exponentially so.  Yes I know, we all love loosely-typed (sometimes stringly typed) languages like JS & Python.  Those are wonderful tools for certain jobs/problems.  Again, be mindful and know your flows.

flow with the chart yo
I’m not sure what’s more disturbing.. the fact that this was the first image search result for “flow meme”, or the fact that it’s actually quite appropriate.

Thanks for reading, as always!