I have encountered this issue several times recently and I've been struggling to confirm the source of the problem.
Most recently, I saw this on a domain that uses Kerio for email. We'll call their domain example.com
. The SPF record for example.com
looks like this:
"v=spf1 include:servers.mcsv.net include:isync.io ~all"
The bounce message the client received specifies the following under the Diagnostic-Code
response header:
spf [exmaple.com] with ip:[207.254.17.11]=did not pass
However, you can clearly see that 207.254.17.11
is allowed by the SPF record of isync.io
, as seen below
v=spf1 a mx a:kmail.isync.io ip4:207.254.17.0/28 ip4:209.123.15.34 ip4:207.254.16.176/28 a:dispatch-us.ppe-hosted.com include:spf.em.secureserver.net include:spf.smtp2go.com -all
Things I've tried or considered
DNS stale cache
I've checked the all the relevant records using securitytrails and found no changes that would have been relevant during the SPF validation in question.
SPF record lookup limit (10)
The total number of lookups for example.com
's SPF record is 8, which is below the lookup limit.
Forwarding/wrong server being validated
I found a blog post and an old ServerFault question that indicate that Google sometimes improperly uses the client IP for the SPF check rather than the actual server that hands off the message to Google, but in this case the bounce message specifies that the correct server was checked. The only possibility I can think of here is that the bounce message is simply untrue, and that the SPF validation is, in fact being run against the client IP, but the bounce message is constructed based on the actual sending server's IP. Short of someone on Google's engineering team responding to this, I would have no way to verify that.
DNS lookup failure/Google SPF validation bug
Having eliminated all of the possibilities above, I'm left with the speculative conclusion that there are just random intermittent DNS lookup failures when validating the SPF record, or there is some other breakdown in Google's (apparently non-standard) SPF validation process. The blog post I referenced earlier speculates that Google's servers are not actually validating the message as it's received, but rather parsing the header information at a later time to try to validate the messages. Indeed, the fact that the client IP is sometimes incorrectly used for validation is evidence of this, and Google is certainly known to take a "spec be damned" approach to their services (see the never-ending autocomplete=off
saga).
What else am I missing?
My speculative answer cannot be verified, and it's very difficult to explain to non-technical people without reducing the answer to "Sometimes Google doesn't do what it's supposed to do with emails." Is there any other avenue I can explore, either to lend credence to my speculative answer, or to find an alternative explanation?
* Quick footnote, in this case example.com
does not have DKIM set up; however, I've seen identical problems with email from domains that do have DKIM properly implemented, and the bounce message indicates that BOTH have failed even if another part of the header shows that DKIM passed validation.