Sentinel Sees Something

Does Evan Know What Hes Talking About?
Charlie does, I just dick around with radar because it looks neat.

Always show love to passive radiometers, but sometimes you gotta go active: satellite radar can see earthstuff at night, through clouds and even peek under dry sand (something something dielectric, ask a scientist). Applications? Glad you didn't ask: you can use satellite radar to monitor illegal fishingfind oil spills, watch ice floes crunch, and spy on whatever. But radar data is harder to interpret and process than what you'd get from Landsat. You have to deal with polarizations and different sensor modes to even get the data, then just eyeballing it shows you a bunch of speckly nonsense, and getting useful knowledge out of it? Buddy.

Enough reading, let's check out a radar image:

San Francisco, CA as seen from the ESA's Sentinel 1A satellite, which carries a C-band synthetic aperture radar. This thing emits microwaves, listens for the echoes, and draws you a picture. Why's it look so different than a Landsat scene? And why's it so pixelly and fuzzy? Unlike passive sensors which get to collect reflectance data using The Best Emitter We Got (the Sun), active sensors have to hurl their own energy and see what comes back; they can only emit so much, so you'll get less image detail and miss stuff that's blocked from the sensor's view by steep terrain (see the scalloped hills in the upper left? Layover.) Also microwaves have longer wavelengths than what Landsat picks up. This can be useful in seeing through vegetation (transparent to the longer-wavelength energy emitted by Sentinel) but can create "what am I looking at" moments like the triple Golden Gate Bridge.

Fun fact: the bright cross on the ship is probably from a radar reflector, which helps the vessel show up on ship tracking radars. Being seen is a good thing when moving a mountain of goods or crude in a vehicle that takes a few miles to even change course.

This radar stuff is complicated! But you, me, and everyone we know are in luck: the wonderful 👏✨👏 Charlie Loyd 👏✨👏 showed me a quick way to process radar data just because I asked. And through his sacrifice you shall know too.

We're going to run Charlie's principal component analysis (PCA) python script on Sentinel 1A radar images. PCA takes different images as inputs (say, bands 3, 4, 5, 7 on a Landsat scene or two different radar polarizations of the same area) and tries to pick out what matters; if the analysis detects redundant information across the different input images, it throws it out, leaving only the juicy bits (bytes?) that account for the variation between them. Say you asked a program to analyze a sample of 1,000 cars. You'd be miffed if it reported "This is big: every car has four wheels!" You care about differences and deviations, the spice of life, right? That's what PCA's for. Don't cite me, I've done this like twice.

Charlie on PCA:

Think of an n-band image as a dataset with n dimensions. In other words, each pixel is a point in n-D space. For most human-important scenes you get something consistent: the first principal component will be the sorta-mostly-mean of the bands (materials that are bright in one part of the visible spectrum are usually bright in others). The second principal component will usually be basically vegetation v. not vegetation (close to NDVI, a standard index). The third – at least over cropland – tends to be wet v. dry surface features. This is the basis for a standard transform called the Tasseled Cap, which is a set of coefficients first taken from PCA of fields. PCA is also really interesting for, say, archeology, because in the furthest-down correlations you sometimes find important things like obscure combinations of plant species that like minerals leached out of ancient walls. (You could also, less wholesomely, use it for oil exploration.) PCA lets you surface stuff you’d never find if you tried.

There are a dozen channels, so it gets really weird as you go down the PCs. Some of them seem to be water vapor content, cloud height – it’s very hard to tell.

So let's grab some radar data and PCA it up. Sentinel 1A has been up since 2014, has a twin in Sentinel 1B and a beams down lots of free synthetic aperture radar data.

  1. Head to the Sentinel data hub at

  2. Sign up for a free account in the top right.

  3. Drag a box around a small area you like (try to find somewhere with interesting topography, which looks super neat under radar. But don't take my word for it).

  4. Put this string in the search field: S1A_IW_GRDH_1SDV*
    That'll ask the server for Sentinel 1A data, in the mode "interferometric wide swath," processed to the "Level 1 Ground Range Detected" product. 

    • Sentinel 1A's radar data comes in a few flavors:
      IW - interferometric wide swath - 250km swath, 5x20m spatial resolution
      EW - extra-wide swath - 400km swath, 20x40m spatial resolution
      SM - strip map - 80km swath, 5x5m spatial resolution

      If you want to get higher-resolution, narrower-swath maps over a smaller area, try this query over Chicago or Houston: *_S1_GRDH*
      If you see little color previews, you're set.

      You can also hit the hamburger to the left of the search bar to narrow the results by date.

  5. If you see thumbnails with color previews, you're on the right track. These are ~800mb compressed files that include two tiffs, one for each polarization.

  6. Find at least two scenes from different dates, with color preview thumbnails and the same footprint, then hit the arrow icon that appears on mouseover to download the files. This will be slow.

  7. The .tiffs will be in the "measurement" folder after you decompress. They're too big and speckly to run through PCA as-is; you'll shrink them to cut down on artifacts. Good old gdalwarp can drop these giant files down to 2500px wide using the cubic resampling method. You can use another resample method, but make sure it's smooth!

gdalwarp -of GTiff -r cubic -ts 2500 0 input.tif output.tif

I did this with four images, two polarizations from two dates over coastal Bulgaria. Thus:

8. Merge your resized files into a single multi-band tiff. You can use either or QGIS > Raster > Miscellaneous > Merge.
9. It's time for PCA: make sure you have the rasterio, sklearn, and numpy python packages, download Charlie's PCA script and feed it the multi-band tiff you just made. This will yield a new multiband tiff, with each band corresponding to a principal component.

$ python your_stacked_file.tif your_output_file.tif

10. I'll let the real C-band explain what you get.

Now you have a tiff whose bands are the principal components of the inputs. Principal component (PC) 1 is basically going to be “brightness”, or something close to an average of all the inputs. So it makes sense to use L*a*b* color, assigning the first three channels in that order. You can use Photoshop or, say, imagemagick. To get the color channels distributed well, you can equalize them and then lower their contrast.

I throw out the fourth PC, because it’s mostly noise from what I’ve seen. One of the nice things about PCA is that it tends to “comb” noise into the later PCs, giving you cleaner signal up front. But in practice, band 4 does seem to have some plausible features in among the static. Mixing it into band 3 might give you something good, or then again it might not.

The disadvantage of PCA, of course, is that there’s no defined interpretation of the PCs. If PC 2, say, seems to give you wet v. dry, or forest v. bare soil, that’s great, but there’s no theoretical or physical rigor to that interpretation, only statistics. Which is still great for a lot of purposes, but it’s not the same as, like, NDVI or whatever.

Something I’d like to try is throwing it much deeper stacks – like 10 observations instead of 2. You get as many bands out of PCA as you put in, so you should be able to pick out a lot more. Of course 10 observations means you’re looking at a longish time period, and there will be intrinsic change, which might throw things off a bit – PCA will start to produce dimensions that are “about” time more than about surface signal. Which is not necessarily bad!

Because you’re looking at a bunch of multiscale-heterogenous surfaces for which every pixel is presumably mixed, you expect your histograms to tend generally to converge on bell curves. (Massive generalization, broken by water among other things.) This is useful for L*a*b* because the a* and b* channels are pretty sensitive – anything far off the midpoint is really saturated. In general, you expect an L*a*b* representation of an arbitrary multiscale-heterogenous scene (like a landscape) to have a* and b* channels that approximate tight (small σ) bell curves. This is what equalizing and then lowering contrast is supposed to get you. But it can get thrown off by the big flat nodata areas, among other things. (Come to think of it, it wouldn’t be hard to mask them out and do the adjustments in skimage … hm.) Just try to get the peaks of the a* and b* bells on the midpoints as a starting point.

–Charlie Loyd, your favorite and mine


11. Open your output tiff in Photoshop (it'll be in RGB and look terrible but it's just not using most of the range). Also create a new document with the same dimensions, but in the "Lab Color" color mode. (Hint: select all > copy first before creating a new document will automatically populate dimensions in your new document window.)

12. Go into just the red channel on your output > select all > copy > paste into the Lightness channel in your fresh document. Repeat for the others; blue channel > a channel, green > b channel. Add a levels or curves adjustment layer and hit "auto" for each channel or futz away with the sliders. 




13. And here's what I got: the PCA picked out forests in light blue, agricultural fields in screaming pink, impervious surfaces in yellow, and a whole lot of other hues I couldn't decode. PCA stripped away the common information and left what was distinct between the 4 input bands, and the colors just reflect what the analysis thought should be lumped together. PCA is trying to tell me something, but it’s beyond my ken.

Man that's small. Click through already.

Man that's small. Click through already.

14. The end! No moral. Remember that your inputs are just numbers to the PCA script; it doesn't know anything about remote sensing. It simply picks out anomalies in your images, and it's your job to figure out what they mean. It's in your hands now. Your burden. I'm sorry.

Illustration wrecks "data viz" every time

Does Evan Know What Hes Talking About?
All opinion, I have to vocationally look at “data viz” and I’m usually not impressed.

Jorge Cham of PhD Comics makes the best possible information graphics. Illustration seems antiquated in the data viz age, but treemaps and big blinking interactives don't even approach the engagement and explanatory value that come out of his Wacom. His comic infographics also translate beautifully to animation: the above is a screencap of his drawing process, overlaid with commentary by the grad students he interviewed. It doses the information at a comfortable pace, gently demonstrating to the viewer the importance of stellar physics in understanding the universe.

Data viz can explore trends but can't approach illustration's natural fit for relating concepts. This becomes obvious when the topic is new to most people: hard science. Let's stack up the fine fists vs. the number pukers.

The neurology graphic above, also by Cham, took the infographics gold at the NSF's 2009 Visualization Challenge. How'd it beat out a big pile of charts? Accessibility. Illustrated infographics speak to us in ways that abstract charts never can; people just like comics. They're intuitive, easier to get into than big colorful visualizations. They literally speak our primitive hindbrain language: pictures! Little people! That looks like me! I'm listening! Way more fun than a big awful tag cloud! What the hell was that trying to tell me anyway!


Jon Chui works in the same vein, making illustrative primers for hardcore chemistry concepts. Way above my head but I'm extremely down with the approach.

Accompanied  this article  on genetics research.

Accompanied this article on genetics research.

In contrast, this science graphic published in the New York Times is a woefully bad fit for a lay audience, using a visualization that works great in the bioinformatics lab but falls flat in a broadsheet. I went so far as to ask the New York Times graphics editor to explain this to me, but he couldn't tell me what this pile of "epigenetics + Circos = ???" was even about.

I'm all about big, impossibly dense graphics, but most readers want nothing to do with this. They want to see the jewelry. Don't hand them the pan and point to the river. That's an awful metaphor but I'm unhealthily attached to it. Anyway in conclusion, when it comes to disseminating information I'll always prefer the skilled illustrator over the R munger. Maybe that's the David Macaulay nostalgia talking, but this just works better.