Note that their infographic ("easy -- just choose four random words") doesn't emphasise how important the random bit plays in choosing the words.
If you pick a password like "the cat sat on the mat", that is not random and therefore much more predictable.
I think they mentioned doing some sanitization to try and catch low-hanging fruit like 'P@ssw0rd' and ''qwertyuiopasdfghjklz". I think they key point is that the policy is based on an estimate of entropy rather than some fixed policy.
I also agree that they very much need to emphasize the random part - perhaps even by suggesting something like diceware or providing a tool to help generate good passwords.
The other problem is that using four words to get up to a length of 16 doesn't actually get you 16 characters of entropy. It's probably closer to about 8 characters of entropy. Which is still good. But not 16.
A diceware password of 4 words would give you about 51 bits of entropy, that would be about a 11 digit all-lower alpha password, a 9 digit upper-lower alpha password or an 8 digit upper-lower alpha, digits and symbols (like you said). Hence why entropy is a much better measurement of brute force password strength than characters.
A possible challenge to implementing this is whether or not downstream systems & applications support long passwords.
Agreed. One of my gripes with moving to a password manager with random passwords has been "central passwords" where systems all use the same password database (say LDAP) but have their own password login. Single sign-in plus app-specific passwords for legacy systems seems like a good solution.