90 seconds to your first pivot table in Excel…

Image by Free-Photos from Pixabay

If you are new to the basic functionality and uses of spreadsheet software like Excel, this series is meant to provide you with a few quick examples to get you comfortable with loading data, cleaning data, and transforming data into meaningful insights. This is meant to be no frills, rapid instruction, so lets jump in.

Loading Data:

Today I’m using a data set entitled “COVID-19 Community Vulnerability Crosswalk — Rank Ordered by Score” from HealthData.gov. Loading a data set like this can be completed if a few simple ways.

  1. Directly open your downloaded…

In under 5 minutes!

Image by Pexels from Pixabay

Spreadsheets are highly underrated. Sure for serious data exploration and ML projects I’m turning to python or R, but if I just want to rapidly build out a dashboard to tell a story for a presentation, Excel is often a reasonable option. So, here’s how to build a dashboard in under 5 min.

Data Source: Time Series of ICU Occupied Beds By State

Initial Data Set:

Image by Kranich17 from Pixabay

Adjective, noun, and verb. Taken from Proto-Germanic “batizo”, closely related to “battle”. Improving on something through a struggle which results in growth, maturity, or improvement. Often requiring unforeseen cost of time, resources, and commitment. Grown from the intrinsic need to become “better” at something or than someone else.

When is “better” worth such a cost? Vilfredo welcomed us to consider 80% to be “better” than chasing after the minuscule gains of the 20%. Further pointing out that every time something becomes “better”, something or someone else becomes “worse”.

A qualitative measure of value often mistaken for improvement if not taken…

Document a pragmatic data dictionary in under 10 minutes!

Image used with permission by Yogesh More from Pixabay

Don’t you wish that all of our data sets came with a nice synopsis to tell us their story? If you are a data scientist, then you can appreciate how much easier it is to work with well documented data. Practically speaking, it’s rare to find clean, well documented, and complete data when starting a new project. That generally leads to one of two outcomes:

  1. You use the data as is, wade through understanding it to push out a semi-complete project and then hope that you remember what it all meant…

Using Python to Download Multiple PDF’s Quickly…

Photo by: Danni Simmonds (freeimages.com)

Sometimes finding data feels a lot like taking on a mountain. Recently I came across some data related to international adoption on Travel.State.Gov. The data was laid out nicely in Plotly, but not in the way I wanted to look at it. Certainly they provided a raw data download link. Nope, just a link to annual PDF reports that contain the data I need. Doh!

It hurts my heart a little to realize that Jerry Maguire came out in 1996. Maybe if the movie was created today 40-somethings all over the world would shout the mantra, “SHOW ME THE DATA!” .

If you are a data scientist looking for data, there are only so many free resources to download before you realize that you need to learn how to scrape web pages.

Before we start, let me state that web scraping should be done responsibly with as minimal impact as possible to the host servers, and we all should be respectful of others creative…

The pythonic way…

Photo by Colin Nixon at www.freeimages.com

Have you ever scanned a document into a pdf as an image and then later realized that you actually needed to be able to edit the document? Adobe has built in optical character recognition (OCR) software that can make for any easy fix, if you have adobe professional. If you don’t have this luxury but have a few minutes, keep reading.

What you need…

  1. Python3
  2. Tesseract OCR: sudo apt-get install tesseract-ocr
  3. These python libraries: wand, Pillow, pyocr, PySimpleGUI

Set up your virtual environment, import your python version of choice, install the libraries and run the code:


photo “data” by CyberHades is licensed under CC BY-NC 2.0

As a physician data scientist and healthcare administer, one of the frequent complaints I hear from other data scientists is that it is difficult to get clinicians to accept the validity of their “new prediction tool”. While I personally feel that the the perception of the clinical community is shifting towards embracing big data and predictive analytics, I am also acutely aware that there is indeed an environment of mistrust between clinicians, administrators, and data analysts/scientists. What steps can we take to change these perceptions and shift towards an environment of collaboration? Here are my thoughts.

1) Clinicians and Data Scientists speak different languages.

Over 77% of clinicians…

photo by Mercelo Gerpe at freeimages.com

Finding the right IDE is like stepping into a “The Three Bears” storybook. This one is too simple, this one is too complicated, this one is ‘Just Right’…except for the annoying fact that it’s using Python 2.7.

At least that was my issue with Sublime. I’ve tried other editors and they all work fine, but for some reason I like the look and feel of Sublime. I an attempt to rectify my issue, I of course turned to DuckDuckGo and found the internet was largely silent on an answer. …

By Dietmar Rabich, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=38134508

I’m a little bit of an efficiency freak. Little things that make more work drive me a little…well…crazy. Things like the uncommented code, improperly labeled files, inconsistent use of cammelCase/PascalCase/Underscore naming conventions, or having the toilet paper on the roll backwards really, really annoy me (it should roll over the top!). From a data science standpoint, my biggest pet peeve, outside of the use of Excel as a document program, is untidy data. If you don’t know what tidy data is, read this!

With that off my chest, I was playing with some data related to worldwide Systolic Blood Pressure…


Husband, Father, Pediatrician & Informaticist writing about whatever is on my mind for today.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store