In light of https://natej.dev/2021/08/13/new-beginnings/ , which you should also read if you care about personal celebrations of life and love. I also have a bit of good news to celebrate at work. I’m finally not the LONE DBA!!
Look ma, I haz underlings!
Also, if you’re at all into the collectibles market, our company is hiring. And our tech org is growing FAST. Like, blazingly fast. Come check us out and apply. We’re super nice folks. (Except this Nate the DBA character, he gets ornery.) =P
I have almost zero down time anymore so don’t expect an impressive amount of content out of me, but I am getting back into writing and teaching, so I hope to do at least a few TSQL-Tuesdays over the rest of this year.
this is about using PowerShell and the BITS framework to copy very large files across servers…
Welcome back! This month’s topic is PowerShell — thanks to one of our prominently bearded community members. PowerShell is a fantastic tool in the IT professional’s toolbelt. I was first introduced to it somewhere in 2014 or 2015 by a colleague, and started making much heavier use of it when my career took me to a new & bigger environment.
Actually, funny side-story. I remember seeing one of the very early incarnations of PowerShell, or what would eventually evolve into it, in college. A graphics programming course, of all things, had a MS partner come in to show us this “cool new” Windows command-shell thing (different and separate from the DOS-style CMD, obviously), where he demonstrated fetching some data from the filesystem, feeding it into a CSV, and doing some kind of super-basic analysis on it to show in a “report” (which was really just another text file). This was 2005-2006, so I couldn’t say what it was specifically, though I seem to remember something about the word “Longhorn”. Although, reading up on some of the Wiki-history, it seems more likely that it was a Monad beta.
Anyway, back on topic. Today’s post is pretty simplistic in comparison to what most people may be writing about. But I’ve already blogged about doing hands-off SQL installation with PowerShell & CLI, and this was another thing kicking-around the back of my mind. So this is about using PowerShell and the BITS framework (*-BitsTransfer cmdlets) to copy very large files across servers. Specifically, database backups. Because let’s face it, they can be really large. And if you’re faced with fetching them off a PROD box, you want to minimize the impact on that box’s resources.
Now sure, there are other ways – xcopy or robocopy with the /J flag (un-buffered IO), or fancy GUI tools. And in an ideal world your backups would be written to a network share that’s not a local drive on the PROD SQL server, right? Right…
Oh, and one more thing. You need to enable BITS via the Windows Features console — search “features” in your Start menu and it should come up as Turn Windows features on or off (Control Panel) . On a server, it’s under the Server Role “Web Server (IIS)”, feature “Background Intelligent Transfer Service (BITS)”. Underneath there are 2 sub-feature choices, “IIS Server Extension” and “Compact Server”. Honestly I don’t know which is preferable, but I left it with the default selection, the first (former). It should go without saying, but don’t do this in production (unless you have the blessing of your SysAdmins).
Why BITS? Well, as per the Docs, it has the following 3 key features (emphasis mine):
Asynchronously transfer files in the foreground or background.
Preserve the responsiveness of other network applications.
Automatically resume file transfers after network disconnects and computer restarts.
Wow, nifty! So it doesn’t hog the network, and it’s resumable (resume-able?) in case of connectivity hiccups. Pretty sweet, no? Also, it can run asynchronously in the background, which means it won’t hog your storage bandwidth or compute resources.
Let’s See an Example
Most of the guts and inspiration for this came from this article over on “Windows OS Hub” (woshub, a somewhat unfortunate sounding acronym, but certainly not as bad as some!). The datePattern nonsense is just to make it “dynamic” in the sense that, if you have a backup scheme like me, with Sunday FULLs, daily DIFFs, and obviously TLogs in some every-X-minutes fashion, you’ll usually want the latest set of FULLs and DIFFs. But you could easily tweak this, make it more point-in-time aware or whatever, as needed.
So, here’s a bit of a talk-thru outline, and then I’ll just link the Gist.
Get the list of files we want to copy, from “source”
For each file:
Make sure it doesn’t exist in the “destination”
If not, start a BITS transfer job (saving said job to a variable for checking/finishing later)
While said BITS job is pending, print a progress message and sleep for some seconds
Finish (“complete”) said job and move on to the next file
Conclude with a message about how much work we just did!
Repeat steps 1-3 for another “set of files” (list) if desired
There are some downsides here. First, you cannot use BITS in a non-interactive mode, i.e. inside a Scheduled Task as a User that’s not logged-in. This is because it’s a “desktop”-oriented feature, not a “server” one. Second, I’ve never been able to get multiple transfers going at once — or at least, multiple PoSh scripts which use BITS transfers. This could very well be my fault, but it does seem like the BITS jobs are “serial” in nature, i.e. one must finish before the next one can start. Again, not the expert, just observing what I found during my experiments.
BITS transfer is an interesting method for copying extra-large files around your environment with low overhead. PowerShell makes it easily accessible and lets you wrap it up in loops and checks so you can effectively build a progress-indicative, predictable and reproducible method for copying a whole SQL server’s set of backups from one place to another.
What cool little things have you discovered using PowerShell? Let me know! Thanks for reading.
What could possibly go wrong? As it turns out, plenty.
Faithful reader(s), it’s been a while! I’ve been busy preparing for some big transitions. I’m also getting better at MDX queries, tweaking SSAS-based reports to more accurately reflect the business rules. But enough about that, on with the post!
In which we doubt the SAN
A storage area network (SAN) is a management & administration solution, not a performance solution.
-someone wiser than me
SANs are wonderful technology. They inspire all kinds of geekery and are purported to solve all your storage woes. But there’s a catch: they’re expensive. Not just as a capital expense, but in maintenance and licensing costs. And if you ever want to upgrade it, like add some more drives to a particular tier/pool — fuhgeddaboudit.
So what do we do with SQL on a SAN? Well, it has tiers, right? Slower storage with huge capacity, faster storage with less, etc. We put the data files (heavy random read workload, typically) on the pool optimized for that kind of I/O pattern. We put the TLog files (heavy sequential write workload) on the pool best suited for that. And what about good ol’ TempDB? Its access pattern is fairly unique — random writes and reads, and frequent overwrites, which means it could potentially wear out your typical prosumer SSD relatively quickly. But we’re not complete cheapskates, we’ll buy enterprise class SSDs, no?
So we go read some stuff and figure, hey, sounds like a great idea, right? Put TempDB on a local SSD, or better yet, a pair of SSDs in RAID-0 for pure performance (because this is a cluster, we’ve got HA already). We’ll reduce the load on the SAN I/O channels and make our overworked TempDB happier with lower latency and better throughput. Right?
In which we discover what could possibly go wrong.
Once the new drive(s) is(are) installed and “presented” to Windows (that’s my SysAdmin’s term), it’s fairly trivial to do the SQL configuration change — it does of course require a SQL service restart (or cluster failover). Code example, assuming your new drive is ‘T’:
alter database tempdb
modify file (name=tempdev, filename='T:\tempdb.mdf')
alter database tempdb
modify file (name=tempdb2, filename='T:\tempdb2.ndf')
You do of course have multiple TempDB data files, yes? Good.
Should we put templog (TempDB’s transaction log) on the same drive as the TempDB data files, or put it on the same storage pool as the regular DBs’ TLogs? As usual, “it depends” — ask your favorite SQL gurus and do some testing.
Back on topic
We’ve made the change, we’ve done the cluster failover. TempDB is now running on our spankin’ new SSD. So we start monitoring performance metrics. Things like file I/O stats (from SQL DMV sys.dm_io_virtual_file_stats), latency and waits (from our monitoring tools), and good ol’ PerfMon.
But wait, what’s this? I/O stalls are higher? Write latency is higher?!? Perfmon agrees?
Write latency on the TempDB files was over 10x higher than it was when they were on the SAN (the performance tier, to be clear). The file_stats DMV showed large increases in I/O stalls. Sad-trombone.
In which we have several theories
Then ensued various conversations and brainstorms among my colleagues.
Someone check the firmware/drivers!
Maybe it’s got the wrong block-size.
Well, it’s only 6Gbps SAS… maybe we should’ve sprung for the 12Gbps.
The write latencies went up by a factor of 10. I don’t think an improvement by a factor of 2 is going to win you any trophies.
Why didn’t we get an NVMe or M.2 one?
Because the damn blades don’t have those slots, goober.
Another interesting observation, and potentially the silver lining. Overall instance waits (wait stats), according to our monitoring tool, went down. That’s good news, right? Maybe. Does application performance & user experience corroborate it? Possibly! We’ll be observing the patient for another week or so.
Let’s turn to the community again to see what others have experience.
Oh wait, it might not be such a fantastic idea after all.
And by “we” I mean “me”. Being the DBA and the primary proponent of the SSD addition, because I knew our workloads were very TempDB-heavy, I had to hang-tail and admit that the SAN gods won this round.
But wait, what about the fact that our wait stats are down? What about app/user experience? Valid arguments, I agree. That’s why we’re still observing. But I’m not optimistic, given the follow-up links above. We may utilize local SSDs for something else (index filegroups?) — but if those write latencies don’t improve, I’m concerned that it won’t help anybody.
In which I ask for your help
Yes, you! If you have ideas on what we did wrong, what we’re missing, or any other advice about getting the most “bang for the buck” out of a direct attached SSD on a converged-infrastructure Cisco UCS blade server platform with a VNX SAN, by all means, drop me a line. I’m all ears.
The problem isn’t so much that the role is vaguely defined. Although, depending on the size of the IT org and the tech stack, it can vary widely from a jack-of-all (DB Dev, report writer, production ops, the works) to a highly specialized performance tuner who works with 7 other teammates, each of whom has a unique surgical specialty in the data platform. But that’s not the problem — well, not the main problem. It is a problem in the sense that the business folks, especially HR, are notoriously and astonishingly ignorant of what a DBA or related role actually involves. But you get past that once you start talking to the tech leads and IT directors.
No, the problem is that, like most higher level technical roles, you don’t really know how a candidate is going to function in it (the role) without actually seeing him or her.. IN it. Do they keep a cool head when production goes down? Do they have a solid plan of attack for the dreaded “everything is slow!” complaint-storm? Do they have a good handle on HA & DR architecture & implementation? Can you rely on them to actually practice and follow thru with those strategies? Can they be a continuous learner and keep abreast of new developments while still tempering that with wisdom & maturity, applying the correct tools to the proper problems? Do try add value to the team and organization by both teaching and learning from others?
These are truly difficult, complex questions that are nearly impossible to deeply assess and fully answer during an interview process. Largely because the only real evidence of their answers lies in actual experience. Sure, a cert here or an MVP there definitely helps your case. But at any rate, we try our best to chip away at the boulder.
Pivoting to a more positive note, I’ll share some of the better questions that I’ve experienced during my career so far.
Some good examples.
How would you design and build a data copy/sync process across/between tiered environments, say DEV-QA-PROD?
Really great question. This is a common problem is small-to-medium enterprises with legacy systems where DevOps hasn’t quite reached down to the depths of the internal application stacks and people are still dealing with “refresh cycles” on the order of months, quarters, or even years. You can approach it purely from a tooling perspective, but that’s not the whole picture. Thus, it calls for some thought and team-culture ideas as well as “knowing the nerd-knobs”.
We have a complex process flow that involves a lot of stored procedures, say 50 total. Some of these are non-sequential, meaning they can be executed in arbitrary order, while others need to be sequenced with each other in “blocks”. This is a vendor product, so ultimately, the customer gets to decide the schedule and order of execution of this flow. The solution needs to be maintainable by field engineers. How would you handle this?
Woah. Talk about diving down a rabbit-hole. This is interesting in the sense that it exposes a bit of the architecture and some of the potential pain-points that the team is hoping to solve, while leaving enough room for improvement and experimentation by the hopeful candidate. More to the point, it’s just an example of a more general technique, which to me is very effective: taking an architectural problem that actually comes from the “real world” (the company/team that’s interviewing) and asking for the candidate’s ideas on how to solve it. You don’t need to get in-the-weeds super-detailed about it, but outlining your ideas helps indicate how you think about complex challenges and shows what kind of value-add you would bring to the team.
And finally, a perennial favorite:
Tell me about a time you broke production, and more importantly, how you addressed and resolved it.
So many stories from the trenches involve downtime and mistakes, it’s good to ‘bond’ over them. It helps bring the egos back down to earth, and reminds us that we’re all just meatbags, making technology to do our bidding, occasionally to our own regret. It shows the candidate’s “pressure cooker” mentality, or at least, what they tell you about it.
If you’re a DBA, Dev, or IT pro, help your managers better understand your team’s needs when it comes to hiring. Get involved in the job description write-ups and screening process questionnaires. Barge your way into those ivory towers, if you have to — or you’ll regret the time you waste on candidates who really belong in a different role than the one you’re after.
If you’re a manager, PLEASE LISTEN to your reports and tech leads. They know what makes a good team member, they’ve been doing it for a long time. Don’t dismiss their advice or block them from being part of the hiring process — yes, they are busy, and yes, they can be crotchety, but their input is highly valuable toward bringing in effective & productive talent.
That’s all folks!
PS: I know I missed the “deadline” by about an hour..ish. I blame DST. Heck, it’s still Tuesday for the majority of the Western hemisphere. I’m not biased, but I write in English, so… ya know. Take it as you will. Now excuse me while I go hide from the blog-police in my ASCII-bunker.
..while there are likely even better ways to do this in the long-run, this quick & easy approach was sufficient to save me time and effort..
At this point in my career, I’m not managing massive environments, so I don’t feel the need (nor have the expertise) to use a large scale solution like DSC or SCCM. But I’ve had to install SQL Server a few times, so I figured it’s worth at least scripting out a standard pre-configured installation, so that A) I don’t need click through a GUI ‘wizard’ hearkening back to the ’90s, and B) the SysAdmins can “fire and forget” (read: stop bugging me about it).
Thus, I’m attempting to K.I.S.S., while making it configurable & repeatable. There are some limitations of this approach, as alluded above. It’s not “massively scalable” (scaleable? scale-able?) because:
The PoSh script still needs to be deployed locally to the server in question
The installer config (.ini) also lives locally (though it probably could be a UNC path, it’s just a file after all)
The script prompts you for the service account (SQL engine, Agent) credentials and the sa password using the Read-Host -AsSecureStringmethod cmdlet, so some meatbag still has to type those in. This is because we don’t have an enterprise pwd/secret-management system where I could, say, ask it for a service account credential set and tell it to embed that securely in a script without it actually being visible to me. So, while yes, they’re kept in a “vault”, it’s not query-able by anything else, so an admin still needs to copy & paste them into whatever configuration screen he’s working with at the time. Not ideal, I know, but we work with what we’ve got.
PS: Yeah, yeah, “don’t use sa, rename it or disable it; or use Windows Auth only!”. Rage, howl, fire & brimstone. I’m not going to argue about it; we can save that for another post. This environment dictates that its used during setup and then disabled later, so that’s beyond the scope of the installer config.
So yes, while there are likely even better ways to do this in the long-run, this quick & easy approach was sufficient to save me time and effort for the occasions when a new SQL box/VM needs to be spun-up.
A primer on SQL cmd-prompt installation & its arguments
And finally, twothings that I attempted to understand but ultimately failed to implement, because (apparently, at least to me), PowerShell remote-ing is a P.I.T.A.
First we need an .ini file to work with. You could either create it from scratch, or take it from an existing SQL box’s “Setup Bootstrap” folder. Example path C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\Log\20170801_073414\ConfigurationFile.ini — indicating this was an install done on 8/1/2017 at 7:34am. Right above that, at simply C:\Program Files\Microsoft SQL Server\130\Setup Bootstrap\Log\, you’ll see a Summary.txt file, which can actually come in handy while you’re testing these unattended installs and wanting to see why it failed.
The first link above, from MSFT Docs, does a pretty nice job of telling you all the things that make up this config file. You get to bypass the TOS prompt, enter service account details, specify drive letters (paths) for default data/log file locations & tempdb, slipstream update packages (UpdateSource), and even more advanced stuff like AG settings and whatnot. My example will be a simple standalone instance using the default name, so I’ll be sticking with the basics.
We can then use this file in the ConfigurationFile argument of setup.exe from the SQL Server install media. To put a little more color on that: the .ini file is really just a collection of command-line arguments to setup.exe; you could also list them all out in-line, but that would be tedious and silly. Here’s a couple major selling points of creating your own config file:
Slipstream updates (SP’s, CU’s), instead of having it go out to MSFT update servers (or *aghast* sticking with the original RTM bits, you heathen you!)
Specify drive letters / default file locations: sure, this may be considered old-hat if you’re running super slick storage, but I still find it makes management a bit easier if I know where my MDFs, LDFs, TempDB, & backups will always be.
Take advantage of 2016’s better TempDB setup options (# files, size & growth)
We will, however, keep a couple arguments out of the .ini file and instead throw them into the ArgumentList from the calling PowerShell script. Speaking of, here’s what the PowerShell script needs to do:
Prompt the operator (SysAdmin or DBA) for the SQL & Agent service account credentials, and (optionally) the sa pwd (if using it).
Fetch our install media from the central network share where we store such things (server & office ISOs, for example).
Mount said ISO to our virtual disc drive.
Run its setup.exe with the following arguments:
The config .ini file
The service & sa accounts
After it’s done, un-mount (dismount) the ISO.
Then the DBA can connect to the brand-spankin’-new running SQL instance and do other post-setup configurations as desired (i.e. set max-memory, maxDOP/CTFP, etc). And sure, those could also be done in PowerShell (thanks in no small part to the awesometeam at DbaTools), I chose not to do so in this case.
As the bloggers say, “that’s left as an exercise to the reader”.
Plus, they’re never quite as deterministic as we’d like them to be — they depend on the server’s compute resources, i.e. memory size & CPU cores, as well as estimated workload & environment tier, so it’s often a gamble in “how correct” your initial settings will be anyway. Still, anything is better than the defaults, so configure-away!
Here are the Gists I’ve created to go along with this post. If I’ve made a mistake, or if you, dear reader, have a suggestion, we can incorporate them into the gist without me having to go back and edit the blog post!
I’d love to get feedback on how you would improve this, what you might do differently, etc. Drop me a comment or a tweet!
; There are THREE static placeholders you need to replace/type when you use this:
; 1. <YOUR UPDATES/PATCHES FOLDER>, which is for slipstreaming whatever service-pack(s) and cumulative-update(s).
; 2-3. <YOUR DOMAIN> and <YOUR DBA GROUP>: for specifying domain account(s) which are given SQL sysadmin ('sa') rights,
; hence, usually your DBA group. You could move that part out to the calling script instead, if you wanted,
; but I chose to leave it here because it's not likely to change much within an environment.
; required first line
; Specifies a Setup work flow, like INSTALL, UNINSTALL, or UPGRADE. This is a required parameter.
; Specifies that SQL Server Setup should not display the privacy statement when ran from the command line.
; Use the /ENU parameter to install the English version of SQL Server on your localized Windows operating system.
; Setup will run silently, logging to files in the setup-boostrap directory
; Specify whether SQL Server Setup should discover and include product updates. The valid values are True and False or 1 and 0. By default SQL Server Setup will include updates that are found.
; If this parameter is provided, then this computer will use Microsoft Update to check for updates.
; Specifies features to install, uninstall, or upgrade. The list of top-level features include SQL, AS, RS, IS, MDS, and Tools. The SQL feature will install the Database Engine, Replication, Full-Text, and Data Quality Services (DQS) server. The Tools feature will install shared components.
; Specify the location where SQL Server Setup will obtain product updates. The valid values are "MU" to search Microsoft Update, a valid folder path, a relative path such as .\MyUpdates or a UNC share. By default SQL Server Setup will search Microsoft Update or a Windows Update service through the Window Server Update Services.
UpdateSource="<YOUR UPDATES/PATCHES FOLDER>"
; Displays the command line parameters usage
; Specifies that Setup should install into WOW64. This command line argument is not supported on an IA64 or a 32-bit system.
; Specify a default or named instance. MSSQLSERVER is the default instance for non-Express editions and SQLExpress for Express editions. This parameter is required when installing the SQL Server Database Engine (SQL), Analysis Services (AS), or Reporting Services (RS).
; Specify the root installation directory for shared components. This directory remains unchanged after shared components are already installed.
; Level to enable FILESTREAM feature at (0, 1, 2 or 3).
; Specifies a Windows collation or an SQL collation to use for the Database Engine.
; Set to "True" to enable instant file initialization for SQL Server service. If enabled, Setup will grant Perform Volume Maintenance Task privilege to the Database Engine Service SID. This may lead to information disclosure as it could allow deleted content to be accessed by an unauthorized principal.
; Windows account(s) to provision as SQL Server system administrators.
DBA does not mean “database archaeologist”, even though sometimes that’s what you end up doing…
Aka “Dammit Jim, I’m a DBA, not a data researcher!” Or, as I stated on Twitter:
DBA != Database Archaeologist
Today I spent more hours than I care to admit, tracking down some obscure data from two disparate systems in an attempt to reconcile what were supposed to be matching records. Part of that is my own fault — I’m a sucker for interesting problems and edge cases, as I’ve blogged about before…
mostly just for the smug satisfaction of proving to the business that “your assumptions about how your data works are invalid“.
But mostly it’s because, the further back in time you go, the less reliable the data becomes. Especially (exponentially) when that data originates from human free-form text input.
Let’s contrive an example. We have our core business product system, WidgetMaster, which tracks Widgets we ship out by WidgetNumber. Our partner associate business runs an online widget exchange where people can buy and sell their Widgets in a sort of second-hand/after-market fashion. PartnerExchange listings are supposed to include the WidgetNumber for ease of tracking and associating data between the two systems, even though they’re officially run by different companies (or in my case, different departments of the same company — yeah, think about that for a second).
Now, ideally, theoretically, those WidgetNumbers should always match up. But unfortunately, up until late 2014, our WidgetMaster system didn’t have an API that PartnerExchange could call to obtain a widget by its number; and even if we did, they have to at some level rely on the customer (or a worker) to read and enter that WidgetNumber into the exchange listing. But wait, we started doing bar-codes back in 2010, so “most” of them are actually scanned from the bar-code, but not every customer has that capability, so there’s still a lot of hand entered data.
So we have some dirty data. Let’s complicate things a bit. Over time, those widgets can come back to WidgetMaster for update/upgrade and then ship back out. Again, WidgetNumber should remain consistent throughout that process. Now, when PartnerExchange sells certain particular widgets, sometimes they’re part of a SuperSpecialCollection. This collection spans many years, maybe even a decade or more. WidgetMaster got wind of this SuperSpecialCollection, being bought-up by Mr. HighRollerCustomer, so we started marking the incoming/outgoing records with a new property.
But it’s text.
It’s entered by the receiver, based on looking at the Widget’s buy/sell history in PartnerExchange. And yes, the HighRollerCustomer who last bought the widget is aware that it’s part of their SuperSpecialCollection, but they aren’t guaranteed to specify that when they send the widget back in to WidgetMaster for upgrade.
Do we see the problem yet?
See, about 5 years ago, there was a reorg, and the dev team for WidgetMaster completely revamped the way in which “collection membership” for incoming widgets is designated/tracked. So now it’s over in some property table. To make matters worse, PartnerExchange renamedSuperSpecialCollection to AwesomeCltn a few years ago because they were tired of typing so many letters (and apparently fans of cryptic abbreviations).
Fortunately, PartnerExchange has done a decent job of at least storing the correct WidgetType and WidgetQuality in their listings, despite WidgetNumbers being fairly sparse. But again, because over in WidgetMaster, we’re supposed to associate each WidgetNumber with the AwesomeCollection, we now have this secondary task of mapping unmatched WidgetNumbers across systems, by using Type and Quality from one side (partner) combined with Collection-membership from the other side (master), by assuming that the partner’s designation of SuperSpecial/AwesomeCollection is correct.
If your head’s not spinning yet, give yourself a round of applause. While rubbing your tummy and tapping your foot.
Needless to say, this is hard. We’ll probably get the majority of records matched (mapped?) eventually by using a couple string LIKE predicates and some clever try/pass/retry flow, but it’s tedious at best. Another bit of frustration will come up when we do a couple ad-hoc searches thru each system to attempt to apply reason and logic, i.e. find a pattern; and because we’ve already done the work, we might as well put that info into our results, even if it doesn’t show us a useful pattern by itself.
So how do we approach this? We’ll as I said, I spent what I felt was too much time on it, but essentially I did an initial “majority rules” mapping attempt (first pass), followed by a few reconciliation attempts for the remainders. Those consisted of fairly identifiable patterns with a couple outliers. With those outliers, as with the rest of the unmapped records at the end of the day, I had to tell the business, basically, “Here’s the majority of the results. You can assign a research specialist or analyst to the rest, if you feel it’s that important.”
I may have done this with slightly more attitude than necessary.
How could we improve this? The bigger geeks in the room may pipe up with “machine learning!” Ok sparky, do you have that infrastructure ready to go? No? How long for implementation? mumble mumble something about Azure mumble… Okay, sure, how about training the model so you can feed it your data? Cool, well enjoy the incredulous laugh you’ll get when you tell the manager that.
How about other tool sets? Sure, we could check out Python or R, write a C# app maybe? Well guess what, we still need to look at the data to understand what patterns (or lack thereof) there are to work with. And again, lead time. Unfamiliar tools means longer development cycles. And they’re really not guaranteed to produce any better results (more matches) at the end of the day, are they?
Because your data models and your analyses are only as good asthe data itself.
And with that, I’ll call it a day. Thanks for reading!
How it lost its original clustering key is a perfect topic for this month’s T-SQL Tuesday!
As previously referenced, we had a half-billion row table (which we’ll “round up” to a billion, just for the sake of argument) that needed an index. A clustering key, to be exact. How it lost its original clustering key isn’t that interesting is a perfect topic for this month’s T-SQL Tuesday (we planned to replace it, before realizing how long it would take, and had to cancel the job after it was removed but before it could finish adding the new one). Anybody can drop an index; it takes mere milliseconds. But to create (or rebuild) an index, the SQL engine needs to touch every single row that index will include, and of course for a clustered index, that’s errverybody.
When planning an index replacement for a billion-row table, don’t just schedule the job as “drop old index, then create new index“, and expect nothing to go wrong.
Don’t try to index a billion-row table all at once. Use a smarter methodology. Unless you can literally afford to have that table be offline for several hours (depending on your hardware, and assuming you’re on Standard Edition.
Of course, some of you crazy kids with Expensive Edition are scoffing and thinking “Oh that’s nothin’!”, with your billions of rows and online index rebuilds. Well you can go back to your fancy Always Encrypted data and your terabytes of RAM and just pretend I said “trillion” instead of “billion” — maybe that’ll fit your scale a bit better. But for the rest of us mere mortals…
Anyway, since we’re peons, we can’t just go create the billion-row index without taking the table offline. So we need to come up with a way to minimize that downtime for the table, and avoid causing excessive blocking or I/O overload on the instance. As I said before, SSIS to the rescue!
I call this the “setup, dump, & swap”. Essentially we need to create an empty copy of the table, with the desired index(es), dump all the data into it, and then swap it in. There are couple ways you can do this, but it boils down to the same basic premise: It’s “better” (probably not in terms of speed, but definitely in terms of efficiency and overhead) to fill this new copy of the table & its indexes, than it is to build the desired index on the existing table.
So here’s an outline:
Script out the table (SSMS, right-click, script table, create to… or, if you use oneofthesecoolextensions, F12 or similar “get definition” shortcut) — say, if the original is MyTable, script-create & replace MyTable with MyTableCopy
Here’s a little room for choice. You could create the table in the same schema with a new name; or, you could create the table in a different schema, with the same name or another name. This will determine how you do the “swap” toward the end.
In the first case, we’d use sp_rename
In the 2nd, we’d use alter schema transfer
Both are essentially meta-data changes, but the latter could be potentially take a hair longer just because it pulls more strings; whereas the former requires taking care of dependencies ahead of time so you don’t break a schema-bound view or orphan a foreign key.
Add the desired index(es), e.g. create clustered index CX_MyTableCopy_Datestamp_ThingName on dbo.MyTableCopy (Datestamp, ThingName)
Build the SSIS task to copy the data from MyTable to MyTableCopy
Schedule said task via SQL Agent.
Use the previous tip to monitor its progress and estimate time to completion!
Once done, prepare to swap!
Again, use sp_rename, and if needed, alter schema transfer.
Likely, this will involve several renames – the constraints and other indexes can’t be named the same either, so get all that stuff renamed with an _old suffix first, then you can swap the actual tables.
Clean up after yourself and drop the old table once you verify everything’s working well and all dependencies are accounted for.
Let’s back up a bit. Why are we doing it this way? More importantly, what are the general use-cases for this kind of thing? Well as I said, it’s “better” in certain ways, than simply creating the index on the “live” T.O.U.S. It avoids locking said live table, and it has the potential to put less stress on the database and less synchronous I/O against its primary files. Finally, and most importantly, as a bulk insert operation, with configurable batch sizing, the SSIS task will put exponentially less load on the transaction log. Whereas, with the regular inline index creation, it could easily fill up the Tlog and will definitely cause performance issues.
Furthermore, swapping tables, or in more advanced cases, partitions, is a fantastic way to move massive amounts of data around with minimal impact. It also happens to be quite helpful when we have to deal with indexing such massive amounts of data.
In my case, this was actually an archive of 2016 audit-trail data, so if I’d have chose, I could have taken it out of the partitioned view it was part of, and dealt with it being offline for several hours. But because I’m a sucker for a good problem, and because I wanted to be able to semi-accurately monitor the progress to let the boss-man know when it might be done, I chose to take this route instead. It’ll come in handy again soon, I’m sure — we have several other jumbo-tron tables laying about that may need some index tuning and/or partitioning.
So, happy swapping, and don’t let those T.O.U.S.‘s get the best of you!
PS: thanks to @SQLDoubleG for hosting, and thanks in advance to the rest of the community being lenient of my “submission” being a re-purposed existing post with a few extra blurbs thrown in to bring it on-topic! =D
Another teaser, just to keep you waiting… I promise I’ll finish it soon!
This hack is about change buildup/bloat. When you ignore a particular DB’s schema drifts for a while, the amount of changes that the “Review” page needs to show gets larger and larger. Until, eventually, there’s simply too much content for it to handle, and it dies. This is a bummer, because you want to ack those changes, but you don’t necessarily need to see them all in detail, as long as you have a general idea of what they were about.
So, this will be a tale of how I purged a set of un-acknowledged changes from DLM Dashboard for a specific Server/Database. RG support, bless their heart, wanted me to go to the document repo (which, if you’re not aware, uses a RavenDB back-end stored on your local file system) and purge everything. But I said no, I think we can do better than that.
Plus, if you really really care about those changes, you can use the RavenDB Studio (GUI) to dump the documents to CSV — i.e. if this tool is part of your audit/compliance toolbox and you don’t want to lose those changes, but you need to be able to “move on with life” by acking the changes to the DB so that you can resume checking it on a regular basis without having a page-hang/crash.
This will be screenshot-heavy, censorship-heavy (to avoid divulging company secrets!), and meme-free (I know, I know, some of you will thank me and others will be a sad panda).
This is a work-in-progress, but I thought I’d put it out there because it may be interesting to others who use the tool.
This is a “teaser” in that I’m not done writing the content yet, but I wanted to get it out there as “something I’m working on”. The problem at-hand is that we have a table trigger which is constantly being disabled/enabled by a stored-proc , which in turn controls the only acceptable method of updating the columns that would otherwise be forbidden from update by said trigger. Clear as mud? K. Basically there’s a schema change (“drift”) of a certain specific type that I want DLM Dashboard to always ignore on this DB.
SELECT TOP 100 * FROM RedGate.SQLLighthouse.DDL_Events de
ORDER BY de.PostTime
SELECT TOP 100 * FROM RedGate.SQLLighthouse.DDL_Events de
WHERE de.appname NOT IN ('SQLServerCEIP', 'Spotlight Diagnostic Server (Monitoring)')
AND de.appname NOT LIKE 'SQLAgent%'
ORDER BY de.PostTime DESC
ALTER TABLE SQLLighthouse.DDL_Events DROP CONSTRAINT PK__DDL_Even__3213E83FB62EF9A3;
ALTER TABLE SQLLighthouse.DDL_Events ADD CONSTRAINT PK_DDL_Events PRIMARY KEY NONCLUSTERED (id);
CREATE CLUSTERED INDEX CX_DDL_Events_PostTime ON SQLLighthouse.DDL_Events
CREATE INDEX IX_DDL_Events_SPID ON RedGate.SQLLighthouse.DDL_Events
INCLUDE (transaction_id, options, nestlevel);
CREATE INDEX IX_DDL_Events_AppName ON RedGate.SQLLighthouse.DDL_Events
CREATE PRIMARY XML INDEX XI_DDL_Events_EventData ON SQLLighthouse.DDL_Events
CREATE XML INDEX XI_DDL_Events_EvenData_PATH ON SQLLighthouse.DDL_Events
USING XML INDEX XI_DDL_Events_EventData
CREATE XML INDEX XI_DDL_Events_EvenData_PROPERTY ON SQLLighthouse.DDL_Events
USING XML INDEX XI_DDL_Events_EventData
I enjoyed the conversation, because it really got us both thinking more deeply about which areas of our app landscape are in better/worse shape than others.
After the previous discussion about nested views, encapsulation & abstraction, I’d like to write about duplication specifically, and the distinction between actual lines of code being duplicated, versus functional duplication. Because the latter is not OK, but the former is generally acceptable when it’s boilerplate code, or done, again, in the name of performance and efficiency.
So, to expand on last week’s “Encapsulation & Abstraction” segment. The conversation with one of my favorite developers went something like this.
While I agree that there’s some over-reliance on nested views, the reason they get implemented a lot is because there’s a particular problem they seem to easily solve: how to encapsulate business-data rules without violating DRY.
Let’s say we have a biz-rule for a “core segment” of data. For simplicity’s sake, let’s call that rule “Widget A consists of a Widget record and a WidgetSupplement record joined by WidgetID, where Widget.WidgetType is ‘foo’.” So it seems obvious to create a view WidgetFooComplete, which pulls in the data from both tables and applies the type condition. This creates a sort of “atomic building block” of data, which can be consumed by apps & data-access methods repeatedly & consistently.
Now, most downstream apps will use that WidgetFooComplete data in its entirety (or nearly). But let’s say there’s a hot new app that needs some more data about the Widgets, and it needs to go out to the WidgetMoarProperties table. The natural inclination is to incorporate our existing “building block” view, WidgetFooComplete, into a new view for this app & its dependencies, and call it WidgetFooMoarComplete.
But what’s the alternative? If we re-create the JOIN/WHERE conditions on the base-tables in this new view, it violates DRY and makes possible future refactoring difficult if that biz-rule changes.
Admittedly, most modern data-access technologies make it easier to create these “building blocks” of joined data entities. And sometimes those biz-rules belong in the app’s lower layers, but this can lead to writing lots of little disparate queries/db-calls for what should have been one atomic operation. That can be a maintenance headache, as could dozens (hundreds) of tailored stored-procs for every data-access scenario.
So it seems like nested views can have their place, but “deep” nesting is usually troublesome. And to prevent the “slippery slope” effect, we have to practice diligence.
That’s pretty spot-on. DBAs tend to criticize them (nested views) as a practice in general because of the tendency to over-use and over-rely on them, and because of that slippery slope, where “a little” use turns into “a lot”, and leads to troubleshooting headaches. And generalizations are just that.
To take some examples in-hand: simple entity relationships, especially when biz-critical, should be A) obvious, and B) documented. Unified views can serve this purpose, but should only be used where appropriate — i.e. to load an object that gets passed around/up the app stack. They’re great “atomic building blocks” when you actually need the entire block of data. But when you don’t — say in a stored-proc that’s doing some data flow operation and only needs a small subset of that data block — it’s probably better to get the relationship logic from the view and copy-paste it (but hopefully not all of it!), while omitting the stuff that’s not needed.
The main reason for this is usually index tuning. If we’ve crafted some indexes to meet certain query patterns in certain troublesome procs, we want those procs to use those indexes, not just do a full table scan because they’re using a nested-view which does select * .
When we get to more complex business rules, we need to up our diligence game and be more mindful of dependency checking when planning for a rule change. Proc comment-headers can be helpful here, as can tools that search thru SQL object meta-data and code-bases to produce dependency chains.
The main point is, duplication tends to be OK when it’s not functional duplication, i.e. when the SQL code is more-or-less similar in some places but it’s not exactly the same because the purpose (responsibility) of that module/stored-proc is not the same.
You’re right in that the “31-flavors of tailored procs for data-access” is a big maintenance headache, and sometimes that trumps even the performance concerns. Again it’s about balance — we have to be mindful of both the biz-rule-maintenance concerns and the performance concerns.
I figured. Sometimes I see DBAs criticize developers’ work without seeming to understand that it doesn’t always come from sloppiness or laziness (although sometimes it does!). Often, we’re trying to thread that needle of performance vs. maintainability. In Dev-land, “lazy” is good in the sense of aiming for simplified logic, for ease of both maintenance and understanding. Painstakingly tailoring each data-access call (stored-proc), while good for performance, is kinda opposite of that. But, admittedly, we do fall back on SELECT * all too easily.
Mostly, we try to avoid code duplication because it leads to heavier maintenance overhead. When some modules may perform similar operations, functionally, they will often re-use the same “core” logic, which we in turn encapsulate into its own ‘thing’. But in SQL modules, as you say, that’s not always performant, so it’s definitely a tightrope-walk.
The “Clean Code” school of thought says, if it’s obvious, it’s “self-documenting”. I don’t always agree with it, but it comes from maintenance concerns again. We don’t like situations where someone tweaks the code but doesn’t update the comments, and you end up with misleading comments. Unfortunately, it does come down to diligence again, and even “good” developers will easily fall back to rarely including comments just to avoid this situation. Of course, another potential pitfall of supposedly self-documenting code is, what’s “obvious” to one person isn’t necessarily so to everyone else!
(We both enjoy writing, can you tell?) =P
So basically we agreed to “moderation in all things” and exchanged Buddha statues and sang Kum-Bay-Yah. I enjoyed the exchange because it really got us both thinking more deeply about which areas of our business/app landscape are in better/worse shape than others.
Part 3: Misusing & Abusing Datatypes
Because I’m getting long-winded again, let’s wrap up with a final “Clean SQL Code” topic that’s short & sweet.
Well, not really. There are entirepresentations dedicated to this topic. But I’ll try to keep it condensed.
A date is not a datetime is not a time is not a time interval. Okay? Forthethird time, stop interchanging them! Yes I know, SQL Server is a bit behind some other RDBMS platforms when it comes to this stuff. Sorry, I don’t work for Microsoft. I just deal with their tech.
More to the point, know your data. Understand that there can be consequences to repeatedly casting types, or losing precision during conversion, sometimes exponentially so. Yes I know, we all love loosely-typed (sometimes stringly typed) languages like JS & Python. Those are wonderful tools for certain jobs/problems. Again, be mindful and know your flows.