Dr. Katz: Professional Therapist (coming to DVD this year?) and The Larry Sanders Show were both great '90s US comedies, albeit totally different in style. One similarity was the eclectic mix of comedians and actors as guests, a time capsule of significant performers; so who appeared on both shows?
The current data for Mivvi (introduction) includes, for some series, exactly this information, scraped from different sources but using common IMDb URIs for people.
SPARQL
is an RDF query language, currently being prepared by the W3C’s
RDF Data Access Working Group. (Disambiguation:
Sparql
is also the name of
Danny Ayers’ cat.)
The language is still under development, but there are many implementations
of the various drafts. I chose
Rasqal,
with its
Roqet
front-end (available as rasqal-utils
in Debian).
PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX mvi: <http://mivvi.net/rdf#> SELECT ?c, ?title, ?episode1, ?larryTitle, ?episode2, ?katzTitle FROM <a.rdf> WHERE { <http://en.wikipedia.org/wiki/The_Larry_Sanders_Show#> mvi:seasons ?s1. ?s1 ?w ?season1. ?season1 mvi:episodes ?es1. ?es1 ?x ?episode1. ?episode1 dc:contributor ?c. ?episode1 dc:title ?larryTitle. <http://www.sassman.com/katz/#> mvi:seasons ?s2. ?s2 ?y ?season2. ?season2 mvi:episodes ?es2. ?es2 ?z ?episode2. ?episode2 dc:contributor ?c. ?episode2 dc:title ?katzTitle. ?c dc:title ?title. }
The intent should be apparent, if not the syntax: find any chain,
from season to episode, for both series. By requiring the same contributor,
?c
, for both series, we will only get results where
the same person appeared in both series. The output will be the variables
that satisfied the match.
(You could also use inference to bring down an mvi:series
predicate for each episode. This would make the query far simpler,
at the expense of adding an extra processing step or requiring an
RDF store with inference.)
The version of Rasqal I was using had no support for multiple FROM
graphs, so I merged the RDF ahead of time
(cwm dapcentral/dr-katz.rdf epguides/the-larry-sanders-show.rdf extras/the-larry-sanders-show_guests.rdf >a.rdf
).
SPARQL doesn’t appear to
support rdf:Seq, so the single-letter dummy variables (w, x, y, z)
are used as an approximation of rdf:_[0-9]+
to mean
any indexed member of a sequence.
SPARQL queries can result in tabular data or RDF graphs. For this
query, and to present with XSLT, neither is perfect.
Fresnel
looks like it might be worth investigation but, for now, let’s go
with tabular XML output
(roqet -r xml-v1 multiple-appearances.sparql >contributors2.xml
)
and a whole load of XSLT munging.
(Due to methodology, none of the principals were included: both Jonathan Katz and Garry Shandling guested on each other’s shows, and Janeane Garofalo and Sarah Silverman were Sanders cast members who appeared on Katz.)
Jon Stewart and Steven Wright were the most significant cultural figures of 1990s television comedy. (Both also appeared in The Aristocrats; it’s no They Rule, but you might want to cross-reference that cast list.)
SPARQL is here and it works. Most RDF repositories had their own proprietary query languages before, but standardisation should make it easier to move between implementations.
The boundary between RDF and HTML still feels like an impedence mismatch at the structural level. I’m not sure which side needs to move, or if there’s simply a better approach that I’ve missed. It’s always possible to write code to get the presentation you need, but rarely desirable.