Apple®’s branded house model shares much structure across its trademarks. Terms repeat, and patterns reoccur. It’s structured enough to be tokenised by splitting on transitions from lowercase to uppercase, or white space:
iPad Air® ⇒
["i", "Pad", " ", "Air", "®"]
MacBook Air® ⇒
["Mac", "Book", " ", "Air", "®"]
There are a few exceptions like ‘X’, which is its own token in e.g. Xserve® and Xcode®.
Breaking out the tokens lets us see that case is mostly consistent. Exceptions include ‘Mac’, ‘TV’, ‘Vision’ and ‘Watch’ (uppercase for hardware, and lowercase when followed by ‘OS’), and ‘Touch’ (uppercase except in ‘iPod touch®’). Also good to see ‘+’ rapidly overtaking ‘.’ and ‘-’ as the most common punctuation.
The natural thing to do is place these on a graph, where each distinct token is a node linked to all tokens that appear before or after:
Notably, some tokens see broad use (‘Apple’ and ‘i’ in around fifty trademarks each, and ‘Mac’, ‘Pro’, ‘Air’ in around twenty), whereas some are used in small clusters (three ‘Engine’s and ‘Drop’s; three ‘Writer’s, all printers).
Weight the transitions according to current trademark usage to generate a Markov chain. Start on a token that’s used to start at least one trademark, and finish on one that ends. The resulting chains will, usefully, predict unused trademarks; including:
Following the existing structure means a few sound more “when” than “if.”