Thursday, March 21, 2024

WHERE clause, or FILTER? QGis and small databases

Continuing with getting more QGIS skills, I planned to run the tutorial for changing symbols based on rules or other data conditions. Before I got there, I found a treasure trove of GIS data feeds from the local state government, particularly apt as the sources include property shapes as well as many details on property values. I had looked at the Comptroller's site first only to find the better results on the state planning site.

I had gathered a few data records on specific properties from public information about drinking water wells and mapped a good set of views with QGIS. I was trying not to trace property lines from downloaded imagery and was rewarded for my delay by finding the official records. Of course with caveats, as with any data feed, especially government fed.

I found 4 data sets eventually, 3 as data points and the fourth as polygons. 2 of the 3 point files have  "property" records with one having few columns and the other many; the 3rd has building-specific info.

For my first data loads, I unzipped the procured files then added that data set as a shapefile by telling QGis that the file was the one with the dBASE extension (.dbf). Later I tried point the "Add Layer" file finder at the ZIP file without unpacking the archive and that seemed to work fine. Yay QGis!



Having the data in the GIS console opened many more possibilities. Once I figured out which key column to use (Account ID, go figure) I was able to create a join in QGis. A different experience than mine with database joins with command line SQL scripts, and a bit wrinkly around the edges I'd say.

Part of what I learned in the loading attempts was that source file names might need to be escaped in order to write SQL scripts or filters with QGis itself. The couple files with ASCII/mixed case simply worked with quotes around it. The file name with a Unicode "em dash" on the other hand i struggled with.

SELECT * FROM "public"."CAROparcels0124 — CAROMDPV.shp" LIMIT 10

I worked on creating a view off that table only to hit syntax errors due to bad encoding. I had backslash feedback of multiple kinds "\x0ef" e.g. Eventually I used a connection that maintained the correct characters to record the view definition.

The coolest feature that I found in QGis in this area was being able to drag a locally sourced shape file matrix layer with the GUI and drop it on top of an already open database connection. And hey! The DDL (data definition language) is complete. No scripting, no column defining, just done. I tried to do a screen grab while the create table and insert statements were running, but even with a Raspberry Pi as the "database server."


This showed the stage at 93% complete. I learned to be patient and just let the gears run.

I went back to database filters quickly after checking out QGis filter menu, for performance reasons. If I was looking at a wide area I would want to be able to pan from one small space to another at the price of slow start-up and possibly slow pans as more data are pulled into active stores. On a quick machine you may not notice such sluggishness; for learning purposes I am trying the tutorial lessons on high-ish end x86s and on low-ish end Raspberry Pis. 15,000 records transferred with 100 columns per row is noticeably a lag.

Knowing the data helps decide which fields are filter candidates. In this effort, I recognized some of the columns by their content, not so much their names, and I'll eventually read the data dictionary docs to become more familiar. The interesting ones included kinds of industrial and housing/zoning classification, year structure built and whether a building was on site.

I ended up choosing a political jurisdiction code after trying census tract and looking at other candidates. The first 2 characters are the county, which is constant in the files I pulled, with the next 2 being a district (state level?). That reduced the record counts from 15,000 or so to 3,000, depending on which source file I filtered.

The view definition gets all columns (for simplicity); if I had to, I would look for columns with all NULL or single-values and omit them.

create or replace view carobldg_dist07_v as
select * from "CaroBldg"
where acctid like '0607%'
;

The joins seemed pretty straightforward until I found one property with 2 IDs. Or one ID and 2 property uses. I forget now, and this is my *not to self* to run Pareto over there.

In a later run, the Baltimore County file has 10x what Caroline has, meaning 1/10 the speed in general. Again, character encoding issues.


(~4 minutes for one copy/paste)

> select count(*)  from "BACOparcels0124 \U+00E2\U+0080\U+0094 BACOPOLY.shp"
;
 count  
--------
 233227
(1 row)




2024-03-16T01:02:21     INFO    Task complete : Exporting BACOparcels0124 — BACOMDPV.shp2024-03-16T01:06:08     INFO    Task complete : Exporting BACOparcels0124 — BACOPOLY.shp


And, below, coordinate reference misses; more on that later once I grok the state-provided X-Y pairs.

Below is the Pi4 image of QGis 3.28 on NetBSD 10.0 (RC6).
The log times don't show exactly as I'd like; still looking for any performance logs.
I think that message about shapes is for the data load step, except they aren't in order it seems.

The highlighted area with data details on the right side is part of Baltimore's drinking water supply, showing a border of the City-owned property. Between that 500+ acre parcel is a thin green strip labelled "Loch Raven Reservoir Cooperative Wildlife Management Area."
On the left, I now have a stack of data sources, including "all" of the county info, or a portion via a database view. I'd plan on using the large source, trying different filter logic, and creating more views for the useful fractions.


(Santa, please bring us geoPDF in a future pkgsrc qgis package!)

No comments: