T-SQL Tuesday #104: Code You’d Hate to Live Without

And that’s where we could use a little encouragement, I think — another DBA saying “Yay, it’s not just me!” makes it all worthwhile.

It’s that time of the month again!  Bert‘s fantastic invitation complete with YouTube vid is tempting indeed.  There are so many wonderful community scripts and tools for SQL Server DBAs.  But, in an interesting twist, he specifically asks us about something self-written or self-created.  And I’ll admit, I sometimes have trouble being exceptionally ‘proud’ of my home-brewed stuff.  Sure, I’ve blogged and GitHub‘d, and hopefully those have helped someone along the way.  Most of the time, though, those are specifically suited to a given use-case.

The more difficult code to share, I think, is the stuff that we keep in our “daily-grind” “Get-Stuff-Done” folders.  Scripts that we’ve tweaked and commented sporadically throughout the years, finding this edge-case or that tweak for a given scenario, most of which ends up being pretty environment-specific.  And that’s where we could use a little encouragement, I think — another DBA saying “Hey, it’s not just me!”, or “I knew there was another guy/gal out there who always has to do this kind of thing!”.  Thus, here I will attempt to show off something along those lines and see what you kind folks think.

Where Does it Hurt?

No, I don’t mean the famous wait-stats queries by Paul Randal (those are awesome).  I mean, what are the top few areas of your SQL environment where you’re always questioning something, or being asked to find some answer that’s not immediately obvious?  For me, there are 3.

Replication

Transactional replication is, in a word, ‘brittle’.  It works well when it’s working, and usually you don’t have to think about it.  But when you do need to check on it, or change anything about it (god forbid), or troubleshoot it (more like shoot it, amirite?), it feels a bit like trying to mess with a half-played Jenga stack.  Meaning, you might be just fine, but you might send the whole thing crashing down and have to rebuild it all.

peanut-brittle-from-the-joy-of-baking
As brittle as this stuff. But not nearly as delicious.

I won’t go into the whole troubleshooting aspect here, that’s too much scope.  But there’s a question that can often come up, especially from Devs or Biz-Analysts, and that is: “Hey, is TableX being replicated?”  And rather than subject my poor eyeballs to the replication properties GUI, I just run a little query, which reads from the distribution database and the actual database that’s being published (the ‘publisher’ db), and tells me a list of tables & columns that are being replicated.

Here it is.  Be sure to replace [dbName] with your actual published DB name.  Like StackOverflow or WideWorldImporters, or AdventureWorks <shudder>.

Report Subscriptions (SSRS)

Another question I’m often asked is, “Hey, did ReportX run?”  What they really mean is, “Did ReportX‘s email subscription get sent to BigWigUserY?”  We have an unholy amount of SSRS reports with email subscriptions.  And because I don’t care to bloat my inbox by BCC’ing myself with every single one, I rely on the users to speak up when they don’t receive something they’re expecting.

“This is a terrible idea.”, you say.  “Never trust the users!”

Yes, well, such is life.  If you have a better lazier solution I’m all ears.

So here’s a little script I wrote to query the ReportServer database (which MS will tell you is “officially unsupported”, snore).  If I know some keyword from the report title, or the supposed recipient’s email address, this will tell me if it ran successfully or not — LastRun, LastStatus.  A couple other useful bits: where it lives, ReportPath, and what its SQL Agent Job’s Name is, JobName.

That last bit is peculiar.  The JobName looks like a GUID (because it is, because SSRS just loves GUIDs), but it’s also the actual name of the agent job, which you can use to re-run said job — via exec msdb.dbo.sp_start_job — if the failure wasn’t systemic.  As I often do.

Disk Space

Last but not least, everybody’s favorite topic to try and forget about or pawn-off to the SysAdmins.  “How much space are those databases / data files / log files eating up?”  Well, mister suddenly-cares-about-disk-space-but-is-OK-with-storing-all-domain-users’-iTunes-music-libraries-on-the-central-fileshare-along-with-their-documents-because-that’s-what-we’ve-always-done.  {True story.}  Let me tell you.

keep calm and release the bitter
It’s healthy, I swear!

This script has a lot of comments and commented-out lines because I will tweak it depending on what I need to see.  Sometimes it’s “show me the DBs where the logical filename doesn’t follow my preferred pattern” (the DB name with ‘_data’ or ‘_log’ after it); or perhaps “Only show me files over 5GB with a lot of free space” when I’m itching to shrink something.

“Never shrink your files!  (In production!)”

Yes, thank you, knee-jerk reactionary.  I’m dealing with servers in lower environments, usually, with this one.  😉

What do you think?

I’d love to hear your feedback in the comments!  Happy Tuesday!!  =)

Credit where credit is due.

I did not magically come up with these all by myself.  They’re pieced together from many a StackOverflow answer and/or other community blog post or contribution that I’ve since forgotten.  I usually store a link to the source in these kind of things, when it’s true copy-pasta, but in these cases, I added enough of my own tweaks & style that I no longer tracked the original linkage.  If you see anything that looks familiar, please do tell me where to give kudos and shout-outs!  😀

T-SQL Tuesday #102: Giving Back

This month hosted by Mr. Major (scribnasium / @RileyMajor), whose past #tsql2sday post I may have missed but I love it all the same (BeyondCompare FTW!).  It’s about giving back to your community — mostly technical, i.e. the #SQLFamily / MS Data Platform community, but it can be about any community you’re a part of.

For starters, that’s what we blog for, at least partly.  I mostly blog about things I’ve learned or found interesting in my work, so that I don’t forget them later!  Of course, I’m always happy to learn that it helps someone else out there with a similar problem.  But there’s so much more to a community ecosystem than that.

SQL Saturday

SQL Saturdays are the perfect example of this.  I’ve been to 3 so far – 2 at Orange County, and 1 at San Diego.  I have to call out OC’s organizers, Ted Stathakis & .. the other guy.  Crap, I forgot his name.  He’s awesome though.  Srsly.  I talked with Ted a bit after a session at the last one and he truly is as nice as he is busy – 3 boys, all kinds of volunteer work, AND a full time job as the BI Director at Del Taco!  Wow!

I mean, it’s no Taco Bell, but still, Del Taco!  (Sorry, I had to..)  =P

I really want to volunteer to help at one of these.  I’m not sure what I’d do, but I know when the time comes, it’ll be worth it.

Lunch & Learns

To get my feet wet, I hosted a couple “lunch & learns” at my company with my Business Analyst and Development teams.  We talked about some basics, ranted about formatting a bit, tried to say goodbye to old habits from the SQL 2005 days (hello, date type!), and even dived into a few juicy things like performance-testing with IO stats and why you should never use scalar or multi-statement functions.  We also had a couple heart-to-hearts about DevOps and what it means in our environment.  (Hint: we’ve got a LOOONG way to go.)

At some point I’m going to scrub my slides and post them on SlideShare or something.  I just have to remove any company-specific info first.  ;o)

As with most teaching efforts, it helped me learn (and re-learn) some things myself!  I had to read some documentation to get wording exactly right, and I built a few playground DBs on my local machine to try out different concepts & examples.  Plus, it forced me to justify WHY I held certain opinions or notions on things, and in doing so, realize my own mistakes of the past.  Thus, I became a better DBA just by reinforcing some good practices and updating my own assumptions.

the-more-you-know
Yay learning!

Generosity

If there’s one thing I’ve learned from the many tributes to our late #SQLFamily member SQLSoldier, it’s the importance of being generous with your time.  Whether that means answering to the #sqlhelp tag on Twitter, participating in the SQL Community Slack, answering StackOverflow / DBA.StackExchange questions, or just taking a few moments to help someone at your workplace with a tech problem — it all makes a difference.  I need to be better about this, as far as my “online presence” goes.  In-person, I’m always eager to help, but because I work remotely most of the time (yay!), it’s a bit of a moving-target — the in-office days get packed with meetings and critical face-time (not FaceTime), so the peer-to-peer stuff is often postponed or relegated to Slack.

However, there’s a flip-side to this.  In being generous, we can’t forget to prioritize ourselves and our families.  I’m always being asked “Why are you working so much!?” — well, I could try to explain the difference between ‘work work’ and ‘tech community involvement’ and ‘self-betterment / career planning’ but… yeah.

Sorry, honey, be done in a sec!  =)

Anyway..

I encourage you, dear reader, to give back to your own communities, in whatever form that takes — tech/online, real-life, etc.  I promise that it will benefit you in some new & unexpected way.  Plus, whenever you make a real solid connection with someone, that’s worth its weight in gold.

it's not what you know, it's who you know
ORLY? Yes, RLY!

Favorite TSQL Tuesday #101 Posts

 

Since I didn’t even come close to making it in time for this month’s T-SQL Tuesday, I figured I’d highlight my 5 favorite posts from the community, and then share a few of my own tips/tools.

I use a Central Management Server too, and although I don’t often use it to run a query against multiple instances, it’s definitely a handy built-in feature to take advantage of.  A minor downside is that it only supports Windows Authentication (not SQL logins), so I can’t use it for my AWS RDS instances, nor for the CMS server itself — those I have to keep stored in my local “Registered Servers” section.  Another tool for running queries against multiple instances, with a good deal more flexibility, is Red Gate’s MultiScript, though it’s not free.  ;o)

Ethervane Echo, a clipboard manager and history-remember-er, is similar to something I use called Clipboard Fusion — in fact, it might even be better.  And who doesn’t love dbatools and dbachecks ?  If you’re not using them yet, don’t wait; start getting into PowerShell today by at least trying out some of the ‘get’ cmdlets from dbatools.

Telegraf looks absolutely stunning, as a monitoring system.  It does take some setup work and some maintenance, but it’d be a great branch-out learning opportunity to touch a few different technologies that a traditional SQL DBA might not normally think of.  Hats off to the people behind that, and may it continue to grow.

Leave it to Bert to go “outside the box” a bit, with these tools that help you be a better presenter and collaborator.  I use BeyondCompare, which is similar to WinMerge (tho, again, not free); I’ve fallen in love with its features that go beyond file diff/merge, but it’s nice to have a free option whenever I’m not on my main machine.

This is a broad sweeping post but it captures a LOT of what the community is and should be.  We’re inclusive, we want people to participate, grow & learn from each other, and ultimately advance their careers.  Tons of useful gems in here, from the Slack workspace to the event links to the networking advice.  Excellent stuff; go read it.

Honorable mention:

The SQL DB Modeler beta looks really interesting as an alternative to traditional big-$$$ tools like ER/Studio & Erwin.  If only I wasn’t stuck in brown-field legacy data models 95% of the time… =D

And finally, although they’ve probably been mentioned a few times already, pastetheplan and statisticsparser are two amazingly simple tools from the Brent Ozar folks that make sharing and comparing query performance so much easier.  My M.O. is to use PasteThePlan links in a dba.stackexchange post so that others can easily see the graphical execution-plan to offer feedback; while I use StatisticsParser to compare between A/B-testing runs of a stored-proc I’m trying to refactor & improve.

TSQL Tuesday #100 – Predictions for 2026

Yeah so I missed the boat by a few days week.  That’s pretty much my M.O.  This month’s T-SQL Tuesday #100 is hosted by the author of sp_WhoIsActive and the creator of T-SQL Tuesday himself, the legendary, the incomparable, Adam Machanic.

applause-please
You ain’t never had a friend like the SQL blogger community ;D

The Year is 2026

Despite IT’s best efforts to kill the relational database, it’s still alive and kicking.  Sure, it’s mostly in the cloud, and we’ve largely solved the problems of scalability, availability, and “traditional” maintenance, but the DBA still plays a critical role in the IT organization.  He/she is more of an architect and an automator, someone who understands the business and development needs as they relate to data — its storage, availability, security, and performance — and can leverage cohesive data platform technologies to provide those services and meet those needs.  But the fundamental issue of data quality still haunts even the best environments, because at the end of the day, when you rely on a human to enter text into a field, you’re gonna get garbage inconsistency.  Thus, we’re still fighting that fight, if only to appease our “data scientists” and machine-learning models so that they stop whining about it.

SQL Server itself has evolved.  After realizing that it was pretty silly to bolt-on a hacky “graph db” component to what is, at its core, a relational engine, MS broke that off into its own product, “Microsoft GraphDB Server”.  But the good news is, SQL & GraphDB talk to each other seamlessly; in fact all of the data-platform products integrate and inter-operate much more smoothly than 10 years ago.

We finally have a single unified CE (Cardinality Estimator), which is intelligent enough to know which paths/plans to use for a given query, so we don’t need to mess with those awful trace-flags anymore.  Indexes and Statistics are all but self-maintaining; the DBA rarely has to step in and mess with them.  Part of the reason for this is that SQL Server yells at you if you try to make a GUID the clustering-key, or other such nonsense.  =D

Columnstore is everywhere; traditional row-store (b-tree) indexes barely exist.  JSON storage & indexing inside SQL Server is much better, but it’s still preferable to use a document-store DB if you can.  Hierarchical structures (not to be confused with graphs) are easily implemented and supported, without having to resort to old hacky models.  And user-defined functions (all types) perform nearly on-par with stored procedures.

They’ve replaced sp_who and sp_who2 with the code from sp_WhoIsActive, and made SSMS Activity Monitor suck less & actually be semi-useful as a basic first-response monitor.  Profiler was officially killed off, and XEvents has come into general widespread usage — largely because MS finally dedicated some hard-core dev time to improving its GUI & making it much easier to use.  Native Intellisense finally works, and works well, for all but the most obscure/weird things, and is much less chatty in terms of network traffic to/from the server.

And finally.  FINALLY.  Each database has its own TempDB.

and there was much rejoicing.. yay
We’d only been asking for it for.. 10 years?

T-SQL Tuesday #99: Counting Rows the Less-Hard-Way

We can get our row count, and min & max date values, without ever touching the actual source table!

This month’s invite courtesy of Aaron Bertrand (B | T), whose “bad habits” blog series still inspires many an impassioned debate or engaging argument discussion on a regular basis among DBAs & Developers alike.

And yes, I’m taking the easier of the two “dealer’s choice” choices — the SQL focused one.  (I’m not big on sharing/blogging personal stuff, at least not here; I may one day start another blog for that, or perhaps just occasionally post more #off-topic stuff , but for now you’ll have to be content with my stories of vehicle troubles and the occasional movie-geekery).

So, without further ado…

By the way, what is ‘ado’ and why should there be no further of it?

art-vandelay-importer-exporter
Accidentally apropos on many levels…

Counting Rows in Really Big Tables

Previously touched on here, tables of unusual size (TOUSes) can be tricky.  You don’t want to lock them up for a long period of time, but you often need to gather information about them (such as row count, size, range of values) to perform some kind of operational maintenance with/on them.  In particular, Aaron’s post on “counting rows the hard way” inspired me to look into this a bit more and try to come up with a clever-ish way of finding out some basic “shape of data” info without actually querying (scanning) the table itself.

To start with, it’s actually really simple to get the total row-count from a few system catalog views — Aaron’s already shown you that, so I won’t repeat.  My interest is more in questions like “How many rows match a where-clause?” or “What are the min & max values for thatColumn?”

For this post, I’ll be focusing on a particular kind of table — the “history” or “transaction” table.  The idea here is that you have a record of “every time some event happens in/to some entity”.  A very common example is audit-trail tables, which I’ve been dealing a lot with lately.  Another common example is a “transaction history” table, such as, in our new favorite MSSQL demo database WideWorldImporters, the table Warehouse.StockItemTransaction​.  It’s the 2nd largest table in the db at 260-some-thousand rows.  {The largest is a multi-million-row beast that is actually the system-versioned aka temporal table behind a “normal” table; I might build a phase-2 example around this, but not today.}  So, while our queries won’t be super slow, we’ll get enough of an idea of what’s bad & good from measuring our IO stats (with SET STATISTICS IO ON).

TL;DR: The demo script is available here; the headers below correspond to the comment-lines of the same name, but I’ve left enough commentary in the SQL itself to keep the average reader on-track, so feel free to check it out ahead of time.  But do keep reading at some point!  :o)

A. Gathering Some Intel

First up, we have good ol’ sys.sp_spaceused.  This gives you some sizing info about the table, including its row count.  There’s a “disclaimer” circulating around out there that it’s not “up to the millisecond accurate” , i.e. it might not have the most current row count if someone else is in the middle of an insert operation or whatnot.  But for all intents & purposes, you can consider it truth.

Then you have the “hard ways” that people typically use — and that Aaron, again, covered just fine in his post on the subject, so I won’t spend any more time there.

But what if I want to count rows matching a where clause (a predicate)?  And in dealing with a typical history/transaction table, the predicate is almost always “between such and such dates”.  (Of course we won’t literally use the between operator, as we have been chastised severely; we know best to use >= and < !)  Also, I want to know the MIN and MAX of said dates in the table.  Lord knows we’re gonna be doing some table-scanning.

B. Ok, Let’s Try an Index

In their benevolent wisdom, the SQL deities decided not to give us an index on WideWorldImporters.Warehouse.StockItemTransactions.TransactionOccurredWhen.

BTW, how’s that for a verbose column name?  What, TransactionDate not good enough?  I suppose it isdatetime2 after all, but still…

So we create an index on it, to see if that helps our poor “count where dates” query.  And behold, it does!  We’ve cut our # of logical reads down by about 90% (from 1900 to 200, if you’re following along in the script).  That’s fantastic, but… we can do better.  Because if the table is, say, 500 million rows instead of 260k, that’s about 400,000 logical reads, which.. could definitely suck.

C. The Better Way

Again, the script has an ode to Aaron’s query on sys.partitions/tables to get the row-count from the meta-data.  Then the real fun begins.

There’s a system DMV (or probably ‘DMF‘ – dynamic management function) called sys.dm_db_stats_histogram, which takes the table’s object_id and the index’s index_id as arguments.  It gives you, obviously enough, the statistics histogram of the statistics object corresponding to that index.  We want to store its output in a temp-table (or even a real table — go nuts!) so we can query it some more.

--For example, if our new index is index_id 7:
sys.dm_db_stats_histogram(OBJECT_ID('Warehouse.StockItemTransactions'), 7)

So we create our #StatsHist table (“hist” being an abbreviation for “histogram”, not “history”, though in retrospect that’s probably not worth the possible confusion), and we populate it with the meta-data from Warehouse.StockItemTransactions and its new index that we just created (on TransactionOccurredWhen).  Poof!  We have an easy way of showing min/max values in that column!  Well… almost.  We have to convert the variant datatype to an understandable & aggregate-able (probably a made-up word.. aggregable? aggregatable?) type.  So we add a column range_hk_proper of type datetime2 and populate it with the converted values of range_high_key from the stats-output.

There!  Now we’re cookin’ with gas.  Our min/max/count query, and our “count where date-range” query, run in mere milliseconds, without ever touching the actual source table.  So we don’t lock it up or block anybody else from writing to it, even in the most pessimistic isolation levels.

Except when you created that index we needed on the date column.

Yes, I know.  What we’re hoping is that the tables we deal with in the “real world” already have such an index that we can take advantage of.  If not, well, that’s what maintenance windows are for.  And you better believe you’re gonna need that index sooner or later.

cooking-with-mustard-gas
Been a while since I used a Family Guy meme….

Where To Next?

Ostensibly, this whole thing could probably be turned into a stored-proc so you could run it “on demand” for any table that had a date or datetime column which you wanted to get such information about.  It’d have to do a lot of error-checking, of course — it wouldn’t work if you don’t have such a column, and if there’s no index on it, and probably a myriad of other ‘gotchas’ that I’m not thinking of at the moment.  But I did try to lay the groundwork for improvement. #StatsHist stores schema & table name too, so if you felt like turning it into a mini-data-warehouse holding a BUNCH of stat-histograms for a whole mess of tables, you could definitely do that.  And then you could run some basic analytics on it — min/max/avg, counts by year/month/day, etc.

Sounds like fun, no?  ;o)

T-SQL Tuesday #98: Orphaned Users Redux

It’s that time again!  The first #Tsql2sday of 2018.  Thanks to the Blobeater for this month’s invite: “your technical challenges conquered”.

Because I’m already ridiculously late, I have a short one.  This is about orphaned users — you know, when you restore a database and its users aren’t mapped to the server logins that they should be or used to be.

orphan-movie-poster
Not that kind of orphan… pretty decent movie tho!

The typical solution is sp_change_users_login with the auto_fix or update_one option.  But guess what?  Yep, that’s deprecated.  By the way, did you know that it also has a report option?  Apparently that’s got some bugs…ish?  Try it sometime and see — compare the output of sys.sp_helpuser where the ‘LoginName’ column is null, with sp_change_users_login 'report'.  Preferably on a DB you’ve restored from another server.  😉

So what’s the correct solution?  ALTER USER [theUser] WITH LOGIN = [theLogin].  Simple, no?  Let’s get more general.  Could we come up with a half-decent way do apply this kind of fix dynamically?  Well sure, the nice folks at DBATools have already solved that problem.  And that’s great, really.  But just in case that doesn’t work… ^_^

One of the many things I love about SQL Prompt is the right-click option to “Script as INSERT” (from the results grid).  This is a quick & easy way to built a temp-table for the results of an exec statement so you can do the ol’ insert #tmp exec sys.sp_blah !  Then we can query the list of DB users for the null LoginNames and write a little set of queries to fix them!  Sound good?

UPDATE: Behold the code!

Happy Tuesday!

PS: Coincidentally, today’s (Thursday) SQL Server Central newsletter featured a very similar post by a gentleman over at Madeira Data.  Go check it out, it’s another great solution to this problem!  And while you’re at it, get the SQL Server Radio podcast (created by a couple guys from the same company) – it’s a terrific addition to your iTunes library.

TSQL Tuesday #96: Good Influences

This month’s invitation is brought to you by Ewald Cress (blog, twitter), who I already like based on his tagline —

finds joy in minutiae..

Yes, my friend, don’t we all.

elaine-curious-why-didnt-use-exclamation-point
I can’t spend the rest of my life coming into this stinking apartment every 10 minutes to pore of the excruciating minutiae of every single daily event!

The topic at hand is fairly non-technical, but still important: folks who have made a positive contribution to your career or professional development.  So it’s time for a shout-out!  About a year ago, I wrote about my first major career move.  There were several great influences in my first job, from the developers that taught me how to code, to the DBA who taught me how to keep cool & calm in the face of outages, to the boss who taught me the importance of time management and breadth of knowledge.

Post-mortem

Since I am way too late in posting this, and I don’t feel like waxing poetic, I’ll just say a general “thank you” to all those who’ve helped me along in my career so far, with special acknowledgement to my former boss, my current boss, the SQL family, and my own family.  Happy belated Thanksgiving and have a safe & pleasant holiday season!  I’ll have a real post again quite soon, diving back into the tech stuff.

food-coma-happy-belated-thanksgiving
hope they don’t mind me borrowing their image… =D

TSQL Tuesday 95: Big Data

This month’s party brought to you by Mr. Hammer (b|t).

mc-hammer
No, not THAT one…

I apologize in advance for all the hammertime memes.  It was just too good to pass up.  Surely he must be used to this.  Or at least not surprised by it.  =D

So, Big Data.  What is it?  Well, in simple terms, it’s the realization and acceptance of the fact that data is multi-model, multi-faceted, multi-sourced, and constantly growing.  It’s the fact that the traditional RDBMS is no longer the be-all end-all source of truth and valuable information.  It’s part of a larger ecosystem involving JSON document stores, CSV files, streaming volatile bits of data coming from random devices and user activity that loses its meaning and potential impact almost as quickly as it can be gathered and sifted and stored.

But what do we actually get out of it?  As a small-medium enterprise NOT in the software business, I have to say, not as much as the hype would have us believe.  And look, I’m not so jaded and crusty that I refuse to adapt new tech.  I Just haven’t seen a meaningful transformative business use-case for it.  Sure, we have Google Analytics telling us how our websites are doing, and someone in marketing knows something about trending our social media traffic.  Does it really help us make more money?  Heck if I know.

cease thy actions, my timepiece has indicated the necessity of mallets
Old-timey colonials can even dig it…

Here’s what I’d like to see from the thought leaders.  Give me something I can chew on — a real-world, non-hypothetical, non-frivolous, impactful use-case for adopting and implementing something like Hadoop/Spark or Azure Data Lake.  Show me how my business can realistically journey down the path of predictive analytics and what it’s going to take from our Devs, IT staff, and management to actually get there.

Because they don’t get it yet.  I have managers still worrying about how much we’re spending on a dinky little flash storage array to support the growing needs of our on-prem converged infrastructure stack.  Meanwhile the AWS bill continues to baffle, and Devs want to play with Docker and Lambda.  But we can’t seem to convince the higher-ups that they’re short-staffed on the internal-apps team, even after a minor version upgrade takes 4 hours of Ops time and half a dozen end-users doing post-mortem testing just to be sure we didn’t break anything unexpected.

I’m not here to complain.  Really.  I do want to see something amazing, something inspiring, something that shows me what Big Data truly brings to the table.  And sure, I’ve see the vendor demos; they’re all just a bit outlandish, no?  I mean, they look really cool, sure — who doesn’t want to see a chord diagram of who’s killed who is GoT? — but does that really help my business improve sales and productivity?

My point is, there’s a gap.  A chasm of misunderstanding and mis-matched expectations between what management thinks Big Data is/means, and what it takes to actually implement.  They see the pretty pictures and the fancy demos, but they don’t see the toil and sweat (or at least, in the cloud, gobs of cash) that go into building & operating the underpinnings and pipelines that drive those nice graphics.  Not to mention the fundamental issues of data quality and governance.

continue not, time for hammer it is
OK OK, last one, I swear…

So do us a favor, Big Data pundits.  Show us something real, something that “the little guy” can use to up his/her game in the market.  Something that makes a positive impact on small non-startup non-software businesses with understaffed IT & Dev teams.  But more importantly, stop glossing over the effort and resources that it takes to “do Big Data right“.  Managers and executives need to understand that it’s not magic.  And IT practitioners need to understand that it’s actually worth-while.  Because I believe you — really — that the payoff in the end is there, and is good.  But you need to convince the whole stack.


PS: I know this is a fully day late for T-SQL Tuesday, and as such, I wasn’t going to post a ping-back in the comments of the invite, but then I saw there were only 8 others, so I felt it would benefit the event if I did add my late contribution.  I’ll tweet with a modified hash-tag instead of the standard #tsql2sday, to reflect my late-ness.  Hopefully that’s a fair compromise to the community & the event’s intentions.  =)

PowerShell and BITS

this is about using PowerShell and the BITS framework to copy very large files across servers…

Welcome back!  This month’s topic is PowerShell — thanks to one of our prominently bearded community members.  PowerShell is a fantastic tool in the IT professional’s toolbelt.  I was first introduced to it somewhere in 2014 or 2015 by a colleague, and started making much heavier use of it when my career took me to a new & bigger environment.

Actually, funny side-story.  I remember seeing one of the very early incarnations of PowerShell, or what would eventually evolve into it, in college.  A graphics programming course, of all things, had a MS partner come in to show us this “cool new” Windows command-shell thing (different and separate from the DOS-style CMD, obviously), where he demonstrated fetching some data from the filesystem, feeding it into a CSV, and doing some kind of super-basic analysis on it to show in a “report” (which was really just another text file).  This was 2005-2006, so I couldn’t say what it was specifically, though I seem to remember something about the word “Longhorn”.  Although, reading up on some of the Wiki-history, it seems more likely that it was a Monad beta.

that is so four score and seven years ago
Don’t ask me why hes wearing horn-rims. I don’t know.

Preamble

Anyway, back on topic.  Today’s post is pretty simplistic in comparison to what most people may be writing about.  But I’ve already blogged about doing hands-off SQL installation with PowerShell & CLI, and this was another thing kicking-around the back of my mind.  So this is about using PowerShell and the BITS framework (*-BitsTransfer cmdlets) to copy very large files across servers.  Specifically, database backups.  Because let’s face it, they can be really large.  And if you’re faced with fetching them off a PROD box, you want to minimize the impact on that box’s resources.

Now sure, there are other ways – xcopy or robocopy with the /J flag (un-buffered IO), or fancy GUI tools.  And in an ideal world your backups would be written to a network share that’s not a local drive on the PROD SQL server, right?  Right…

Oh, and one more thing.  You need to enable BITS via the Windows Features console — search “features” in your Start menu and it should come up as Turn Windows features on or off (Control Panel) .  On a server, it’s under the Server Role “Web Server (IIS)”, feature “Background Intelligent Transfer Service (BITS)”.  Underneath there are 2 sub-feature choices, “IIS Server Extension” and “Compact Server”.  Honestly I don’t know which is preferable, but I left it with the default selection, the first (former).  It should go without saying, but don’t do this in production (unless you have the blessing of your SysAdmins).

But Why?

Why BITS?  Well, as per the Docs, it has the following 3 key features (emphasis mine):

  • Asynchronously transfer files in the foreground or background.
  • Preserve the responsiveness of other network applications.
  • Automatically resume file transfers after network disconnects and computer restarts.

Wow, nifty!  So it doesn’t hog the network, and it’s resumable (resume-able?) in case of connectivity hiccups.  Pretty sweet, no?  Also, it can run asynchronously in the background, which means it won’t hog your storage bandwidth or compute resources.

async all the things!
Because we can.

Let’s See an Example

Most of the guts and inspiration for this came from this article over on “Windows OS Hub” (woshub, a somewhat unfortunate sounding acronym, but certainly not as bad as some!).  The datePattern nonsense is just to make it “dynamic” in the sense that, if you have a backup scheme like me, with Sunday FULLs, daily DIFFs, and obviously TLogs in some every-X-minutes fashion, you’ll usually want the latest set of FULLs and DIFFs.  But you could easily tweak this, make it more point-in-time aware or whatever, as needed.

So, here’s a bit of a talk-thru outline, and then I’ll just link the Gist.

  1. Get the list of files we want to copy, from “source”
  2. For each file:
    1. Make sure it doesn’t exist in the “destination”
    2. If not, start a BITS transfer job (saving said job to a variable for checking/finishing later)
    3. While said BITS job is pending, print a progress message and sleep for some seconds
    4. Finish (“complete”) said job and move on to the next file
  3. Conclude with a message about how much work we just did!
  4. Repeat steps 1-3 for another “set of files” (list) if desired

And without further ado, the code.

# AUTHOR: Nate Johnson, @njohnson9402/@natethedba, natethedba.wordpress.com
# LICENSE: https://choosealicense.com/licenses/unlicense/
# TYPE: PowerShell script
# DESCRIPTION/USAGE: example of 2 file copying loops using BITS Transfer cmdlets, targeted at SQL backups
# See corresponding blog post at https://natethedba.wordpress.com/powershell-and-bits/
# ASSUMPTION: your SQL backup files are named with a date component in the form "yyyyMMdd", and use extension ".bak"
$source = "\\BigSQL\Backup\BigSQLInstance\" #network share or local directory
$destiny = "X:\Restore\BigSQLInstance\" #LOCAL drive destination
$testonly = 1 #set to 0 to actually do the copying!
$sleepSeconds = 3 #how many seconds to sleep between each progress-check during each file transfer
# use some date-math to get last Sunday's FULL backups
$datePattern = $(Get-Date).AddDays($((Get-Date).DayOfWeek.value__)).ToString("yyyyMMdd") #e.g. "20170910"
$files = Get-ChildItem Path $source Recurse Filter ("*_FULL_" + $datePattern + "*.bak")
Write-Host ("About to copy " + $files.Count + " files to " + $destiny + "")
for ($i = 0; $i -lt $files.Count; $i++)
{
$srcfile = $files[$i].FullName
$dstfile = $destiny + "\" + $files[$i].Name
Write-Host (" Copying " + $srcfile)
if (Test-Path $dstfile)
{
Write-Host (" File already exists: " + $dstfile)
}
elseif ($testonly -eq 0)
{
$bitsJob = Start-BitsTransfer Source $srcFile Destination $destiny Asynchronous Priority Low
while (($bitsJob.JobState.ToString() -eq "Transferring") -or ($bitsJob.JobState.ToString() -eq "Connecting"))
{
#only display progress when transferring
if ($bitsJob.JobState.ToString() -eq "Transferring")
{
$pctCmplt = ($bitsJob.BytesTransferred / $bitsJob.BytesTotal)
#see https://technet.microsoft.com/en-us/library/ee692795.aspx for number formatting trick
Write-host ($bitsJob.JobState.ToString() + "" + $("{0:P2}" -f $pctCmplt))
}
Sleep $sleepSeconds
}
Complete-BitsTransfer BitsJob $bitsJob
}
Write-Host (" Finished transferring " + $srcfile + " to " + $destiny)
}
# today's DIFF backups
$datePattern = $(Get-Date).ToString("yyyyMMdd") #e.g. "20170912"
$files = Get-ChildItem Path $source Recurse Filter ("*_DIFF_" + $datePattern + "*.bak")
Write-Host ("About to copy " + $files.Count + " files to " + $destiny + "")
for ($i = 0; $i -lt $files.Count; $i++)
{
$srcfile = $files[$i].FullName
$dstfile = $destiny + "\" + $files[$i].Name
Write-Host (" Copying " + $srcfile)
if (Test-Path $dstfile)
{
Write-Host (" File already exists: " + $dstfile)
}
elseif ($testonly -eq 0)
{
$bitsJob = Start-BitsTransfer Source $srcFile Destination $destiny Asynchronous Priority Low
while (($bitsJob.JobState.ToString() -eq "Transferring") -or ($bitsJob.JobState.ToString() -eq "Connecting"))
{
#only display progress when transferring
if ($bitsJob.JobState.ToString() -eq "Transferring")
{
$pctCmplt = ($bitsJob.BytesTransferred / $bitsJob.BytesTotal)
#see https://technet.microsoft.com/en-us/library/ee692795.aspx for number formatting trick
Write-host ($bitsJob.JobState.ToString() + "" + $("{0:P2}" -f $pctCmplt))
}
Sleep $sleepSeconds
}
Complete-BitsTransfer BitsJob $bitsJob
}
Write-Host (" Finished transferring " + $srcfile + " to " + $destiny)
}
Write-Host ("All done!")

The Catch

There are some downsides here.  First, you cannot use BITS in a non-interactive mode, i.e. inside a Scheduled Task as a User that’s not logged-in.  This is because it’s a “desktop”-oriented feature, not a “server” one.  Second, I’ve never been able to get multiple transfers going at once — or at least, multiple PoSh scripts which use BITS transfers.  This could very well be my fault, but it does seem like the BITS jobs are “serial” in nature, i.e. one must finish before the next one can start.  Again, not the expert, just observing what I found during my experiments.

parallel vs serial ports on old computer
Obviously, serial won out in the end (specifically, his superstar protege, USB), but you gotta hand it to parallel, he had a good run.

Conclusion

BITS transfer is an interesting method for copying extra-large files around your environment with low overhead.  PowerShell makes it easily accessible and lets you wrap it up in loops and checks so you can effectively build a progress-indicative, predictable and reproducible method for copying a whole SQL server’s set of backups from one place to another.

What cool little things have you discovered using PowerShell?  Let me know!  Thanks for reading.

TSQL Tuesday 93: Interviews

This month‘s event is hosted by the fabulous DBA/SQL-consultant (& part time cartoonist) Kendra Little! Go check out her blog, podcast, and training at sqlworkbooks.com – really great stuff.

DBA interviews are tricky.

The problem isn’t so much that the role is vaguely defined. Although, depending on the size of the IT org and the tech stack, it can vary widely from a jack-of-all (DB Dev, report writer, production ops, the works) to a highly specialized performance tuner who works with 7 other teammates, each of whom has a unique surgical specialty in the data platform. But that’s not the problem — well, not the main problem. It is a problem in the sense that the business folks, especially HR, are notoriously and astonishingly ignorant of what a DBA or related role actually involves. But you get past that once you start talking to the tech leads and IT directors.

No, the problem is that, like most higher level technical roles, you don’t really know how a candidate is going to function in it (the role) without actually seeing him or her.. IN it. Do they keep a cool head when production goes down? Do they have a solid plan of attack for the dreaded “everything is slow!” complaint-storm? Do they have a good handle on HA & DR architecture & implementation? Can you rely on them to actually practice and follow thru with those strategies? Can they be a continuous learner and keep abreast of new developments while still tempering that with wisdom & maturity, applying the correct tools to the proper problems? Do try add value to the team and organization by both teaching and learning from others?

These are truly difficult, complex questions that are nearly impossible to deeply assess and fully answer during an interview process. Largely because the only real evidence of their answers lies in actual experience. Sure, a cert here or an MVP there definitely helps your case. But at any rate, we try our best to chip away at the boulder.

aerosmith chip away at the stone record label
it was a record. from the 70s. it’s the best i could come up with. =P

Pivoting to a more positive note, I’ll share some of the better questions that I’ve experienced during my career so far.

Some good examples.

How would you design and build a data copy/sync process across/between tiered environments, say DEV-QA-PROD?

Really great question.  This is a common problem is small-to-medium enterprises with legacy systems where DevOps hasn’t quite reached down to the depths of the internal application stacks and people are still dealing with “refresh cycles” on the order of months, quarters, or even years.  You can approach it purely from a tooling perspective, but that’s not the whole picture.  Thus, it calls for some thought and team-culture ideas as well as “knowing the nerd-knobs”.

We have a complex process flow that involves a lot of stored procedures, say 50 total.  Some of these are non-sequential, meaning they can be executed in arbitrary order, while others need to be sequenced with each other in “blocks”.  This is a vendor product, so ultimately, the customer gets to decide the schedule and order of execution of this flow.  The solution needs to be maintainable by field engineers.  How would you handle this?

Woah.  Talk about diving down a rabbit-hole.  This is interesting in the sense that it exposes a bit of the architecture and some of the potential pain-points that the team is hoping to solve, while leaving enough room for improvement and experimentation by the hopeful candidate.  More to the point, it’s just an example of a more general technique, which to me is very effective: taking an architectural problem that actually comes from the “real world” (the company/team that’s interviewing) and asking for the candidate’s ideas on how to solve it.  You don’t need to get in-the-weeds super-detailed about it, but outlining your ideas helps indicate how you think about complex challenges and shows what kind of value-add you would bring to the team.

And finally, a perennial favorite:

Tell me about a time you broke production, and more importantly, how you addressed and resolved it.

So many stories from the trenches involve downtime and mistakes, it’s good to ‘bond’ over them.  It helps bring the egos back down to earth, and reminds us that we’re all just meatbags, making technology to do our bidding, occasionally to our own regret.  It shows the candidate’s “pressure cooker” mentality, or at least, what they tell you about it.

server did what you instructed it to...you don't say?
Don’t ya hate it when…

In conclusion.

If you’re a DBA, Dev, or IT pro, help your managers better understand your team’s needs when it comes to hiring.  Get involved in the job description write-ups and screening process questionnaires.  Barge your way into those ivory towers, if you have to — or you’ll regret the time you waste on candidates who really belong in a different role than the one you’re after.

If you’re a manager, PLEASE LISTEN to your reports and tech leads.  They know what makes a good team member, they’ve been doing it for a long time.  Don’t dismiss their advice or block them from being part of the hiring process — yes, they are busy, and yes, they can be crotchety, but their input is highly valuable toward bringing in effective & productive talent.

That’s all folks!

PS: I know I missed the “deadline” by about an hour..ish.  I blame DST.  Heck, it’s still Tuesday for the majority of the Western hemisphere.  I’m not biased, but I write in English, so… ya know.  Take it as you will.  Now excuse me while I go hide from the blog-police in my ASCII-bunker.

*\ |* ___________
\| ___/ I___I___I_\___
/___I___I___I___I___\ Incoming!!
/__I___I___I___I___I__\ /
___ /_I___I___I___I___I___I_\ ___
/ \ __ /____I___/ \___I____\ __ / \
/ \ / \ _ /___I____/ O \____I___\ _ / \ / \
/ \_/ \_/ \__/__I___I__|____^____|__I___I__\__/ \_/ \_/ \

view raw
AsciiBunker.txt
hosted with ❤ by GitHub