The longest interstate walk
Meandering through US states and territories with Julia, never looking back. Well, just their USPS codes.
Meandering through US states and territories with Julia, never looking back. Well, just their USPS codes.
This is not going to be as didactic (and thus not nearly as long) as my last post, but I thought I'd share my Julia code for this week's Riddler Classic.
The puzzle
The idea is to chain USPS 2-letter codes for US states & territories, overlapping 1 character in between, and not repeating any code in a chain. You can think of it as states/territories share a border if they share a letter in their 2-letter code. Or as they put it:
The challenge is to find the longest string of letters in which (1) every pair of consecutive letters is a two-letter state or territory abbreviation, and (2) no state abbreviation occurs more than once. For example, Guam, Utah and Texas can be combined into the valid four-letter string GUTX. Another valid string is ALAK (Alabama, Louisiana and Alaska), while ALAL (Alabama, Louisiana and Alabama) is invalid because it includes the same state, Alabama, twice.
Wissner-Gross, Z. https://fivethirtyeight.com/features/can-you-help-dakota-jones-raid-the-lost-arc/.
They also provided a link to a USPS page with a list of codes.
Julia solution
I first just copy/pasted the codes into a raw string, and parsed it. I also added the codes for reaching the armed forces.
tbl = """
State/Possession
Abbreviation
Alabama
AL
Alaska
AK
…
Wisconsin
WI
Wyoming
WY
"""
codes = vcat(map(strip, split(tbl, "\n\n")[4:2:end]), ["AA", "AP", "AE"])
Because these are always 2-letter codes, the simplest solution is probably to use a Char
-to-Char
multimap. I opted to use a Trie
/prefix tree:
using DataStructures
function addseqs!(results, prefix, t)
top = prefix[1:2]
if length(results[top]) < length(prefix)
println("Replacing $(results[top]) -> $prefix.")
results[top] = prefix
end
c = string(prefix[end])
s = subtrie(t, c)
s == nothing && return
for k in keys(s)
t = deepcopy(t)
delete!(t, string(c,k))
addseqs!(results, string(prefix, k[end]), t)
end
end
This function takes results
to keep track of the longest observed sequence, prefix
— the current state of the string, and t
, the Trie
with the currently available codes. It builds a sub-trie based on the end of prefix
, and recursively calls itself with a pruned trie.
The search gets kicked off with this function:
function getlongest(codes)
results = DefaultDict{String,String}("")
for first in codes
t = Trie(codes)
delete!(t, first)
addseqs!(results, first, t)
end
results
end
Both of these functions rely on deleting a key from the Trie
, which actually, is not provided in this library. In Julia however, it's easy to create new methods for types you don't own:
function Base.delete!(t::Trie{T}, k::AbstractString) where T
node = subtrie(t, k)
if node != nothing node.is_key = false end
t
end
Solution
There's not much to do from here. Running:
@time results = getlongest(codes)
# Abridged output:
#
# Replacing -> FM.
# Replacing FM -> FMT.
# Replacing FMT -> FMTX.
# Replacing FMTX -> FMTNM.
# Replacing FMTNM -> FMTNMP.
# Replacing FMTNMP -> FMTNMPA.
# Replacing FMTNMPA -> FMTNMPAP.
# Replacing FMTNMPAP -> FMTNMPAPR.
# Replacing FMTNMPAPR -> FMTNMPAPRI.
# Replacing FMTNMPAPRI -> FMTNMPAPRID.
# Replacing FMTNMPAPRID -> FMTNMPAPRIDE.
# Replacing FMTNMPAPRIDE -> FMTNMPAPRIDCA.
# Replacing FMTNMPAPRIDCA -> FMTNMPAPRIDCAK.
# Replacing FMTNMPAPRIDCAK -> FMTNMPAPRIDCAKY.
# Replacing FMTNMPAPRIDCAKY -> FMTNMPAPRIDCAKSD.
# Replacing FMTNMPAPRIDCAKSD -> FMTNMPAPRIDCAKSCT.
# Replacing FMTNMPAPRIDCAKSCT -> FMTNMPAPRIDCAKSCOK.
# Replacing FMTNMPAPRIDCAKSCOK -> FMTNMPAPRIDCAKSCOHI.
# Replacing FMTNMPAPRIDCAKSCOHI -> FMTNMPAPRIDCAKSCOHIA.
# Replacing FMTNMPAPRIDCAKSCOHIA -> FMTNMPAPRIDCAKSCOHIAA.
# Replacing FMTNMPAPRIDCAKSCOHIAA -> FMTNMPAPRIDCAKSCOHIAAL.
# Replacing FMTNMPAPRIDCAKSCOHIAAL -> FMTNMPAPRIDCAKSCOHIAALA.
# Replacing FMTNMPAPRIDCAKSCOHIAALA -> FMTNMPAPRIDCAKSCOHIAALAE.
# Replacing FMTNMPAPRIDCAKSCOHIAALAE -> FMTNMPAPRIDCAKSCOHINVAALA.
# Replacing FMTNMPAPRIDCAKSCOHINVAALA -> FMTNMPAPRIDCAKSCOHINVAALAE.
# Replacing FMTNMPAPRIDCAKSCOHINVAALAE -> FMTNMPAPRIAKSDCAALASCOHINVA.
# Replacing FMTNMPAPRIAKSDCAALASCOHINVA -> FMTNMPAPWAKSDCAALARIASCOHINJ.
# Replacing FMTNMPAPWAKSDCAALARIASCOHINJ -> FMTNMPAPWAKSDCAALARIASCOHINVA.
# Replacing FMTNMPAPWAKSDCAALARIASCOHINVA -> FMTNMPAPWVAKSDCAALARIASCOHINVI.
# Replacing FMTNMPAPWVAKSDCAALARIASCOHINVI -> FMPAPWAKSDCAALARIASCTNMNVINCOHI.
# Replacing FMPAPWAKSDCAALARIASCTNMNVINCOHI -> FMPAPWVAKSDCAALARIASCTNMNVINCOHI.
#
# 1631.254820 seconds (4.47 G allocations: 591.410 GiB, 11.13% gc time)
You can see this was probably not a very good approach, with 4 billion allocations (probably 1 for each deepcopy
). Note that actual memory usage did not exceed ~1GB.
As you might have guessed, I only showed the output for the actual longest walk starting from FM (Federated States of Micronesia, and provider of the TLD of your favorite podcast's domain, I'm sure).
To find the longest sequence:
maximum(v -> (length(v), v), values(results))
# Output:
# (32, "FMPAPWVAKSDCAALARIASCTNMNVINCOHI")
Itinerary
This is quite the mileage run:
- FM: Federated Sates of Micronesia
- MP: Northern Mariana Islands
- PA: Pennsylvania
- AP: Armed Forces Pacific
- PW: Palau
- WV: West Virginia
- VA: Virginia
- AK: Alaska
- KS: Kansas
- SD: South Dakota
- DC: District of Columbia
- CA: California
- AA: Armed Forces Americas
- AL: Alabama
- LA: Louisiana
- AR: Arkansas
- RI: Rhode Island
- IA: Iowa
- AS: American Samoa
- SC: South Carolina
- CT: Connecticut
- TN: Tennessee
- NM: New Mexico
- MN: Minnesota
- NV: Nevada
- VI: Virgin Islands
- IN: Indiana
- NC: North Carolina
- CO: Colorado
- OH: Ohio
- HI: Hawaii
If this doesn't get you elite status …