After I wrangled 2 spreadsheets (and 1 Google sheet) into database tables, my first picks for primary keys needed to be changed based on how the data were and how I wanted things to be.
Whoever did the manual entry into rows and columns used month and year as the key separators, since the events were generally produced every month (for literal decades). So it made sense that the first attempt used these as keys. I don't like to use a serial number key for tasks like this as part of the outcome should be normalizing the data; getting an error message that there are duplicate months in a year needs to be investigated.
Among other things I found were complete cast/crew duplicates for 1 show that ran for 2 months (others had only ditto marks, making them easier to spot and tidy up). I needed to switch from a monthly key to a sequence value because after 2000, the shows were scheduled for several week runs but not staying within a month. And there were weekend shows, plus multi-play productions in single ticket access.
References at the end, one LibreOffice and two DBA Stack Exchange
I started with this key:
primary key (year_str, month_str, production, director, person, role)
The year and month are strings, instead of integers because there were originally a variety of formats which I fixed after initial load.
The propose change:
ALTER TABLE cast_cooked DROP CONSTRAINT cast_cooked_pkey;
ALTER TABLE cast_cooked ADD PRIMARY KEY (season, sequence, production, person, role, band_cast_crew);
Using the example questions, I deleted duplicate rows as I found them, mainly trial-and-error because the first line above always works and the second only when all dupes are clear.
I was overzealous in thinking I should include what turned out to be an indistinct key:
psql:alter_key.sql:3: ERROR: constraint "cast_cooked_pkey" of relation "cast_cooked" does not exist
psql:alter_key.sql:4: ERROR: column "sub_seq" of relation "cast_cooked" contains null values
All nulls are the same, not unique. Turns out this key also messed with LibreOffice forms, as mismatched tables showed no rows in one (scary!). Had to be removed from the primary key, or filled somehow; I chose the former as faster.
ALTER TABLE
psql:alter_key.sql:4: ERROR: could not create unique index "cast_cooked_pkey"
DETAIL: Key (season, sequence, sub_seq, person, role)=(4, 11, C, First Last, Director) is duplicated.
These were usually easy, except for the 44 rows of cast+crew dupes, and then one complete duplicate (don't ask).
db=> delete from cast_cooked where season = 12 and sequence = 7 and month_str = '5' ;
DELETE 44
A simple statement but should be sanity claused.
Drop the table constraint and retry:
psql:alter_key.sql:3: ERROR: constraint "cast_cooked_pkey" of relation "cast_cooked" does not exist
psql:alter_key.sql:4: ERROR: could not create unique index "cast_cooked_pkey"
DETAIL: Key (season, sequence, production, person, role, band_cast_crew)=(26, 3, Scrooge (shocking), First Last, Director, crew) is duplicated.
db=> select * from cast_cooked where season = 26 and sequence = 3 and role = 'Director' order by month_str, person;
year_str | month_str | production | director | person | role | band_cast_crew | season | sequence | opening | closing | sub_seq
------------+------------+--------------------+-----------------+-----------------+----------+----------------+--------+----------+---------+---------+---------
87 | 12 | Scrooge (shocking) | First Last| First Last | Director | crew | 26 | 3 | | |
87 | 12 | Scrooge (shocking) | Firss Last| First Last | Director | crew | 26 | 3 | | |
There was another typo I redacted, duplicated, as director was not a key value.
To remove a duplicate you might need a "ROW ID" which in PostgreSQL acts like this:
db => select ctid from cast_cooked where season = 26 and sequence = 3 and role = 'Director' order by month_str, person;
ctid
--------
(5,14)
(5,30)
(2 rows)
db=> delete from cast_cooked where season = 26 and sequence = 3 and role = 'Director' and ctid = '(5,30)';
DELETE 1
Then the anticlimactic "done" success message (along with the "you already did that" warning).
db=> \i alter_key.sql
psql:alter_key.sql:3: ERROR: constraint "cast_cooked_pkey" of relation "cast_cooked" does not exist
ALTER TABLE
db=>
And now, better queries!
Two-step also:
~ ~ ~
References:
https://books.libreoffice.org/en/BG72/BG7204-Forms.html#toc41
https://dba.stackexchange.com/questions/138320/duplicate-rows-how-to-remove-one
https://dba.stackexchange.com/questions/159793/how-to-set-a-not-null-column-into-null-in-postgresql









