UPDATE: Much ado about nothingThe NHK piece I watched this morning turns out to have been total crap and essentially a staged sending of a password. My apologies for being duped. I should have seen through the bullshit, and I'll explain why below.
But first, the security company that was featured has posted a clarification on their blog. Both the Baidu IME and Simeji are doing cloud conversion of Japanese text. That is, conversion of 2-byte hiragana（全角文字）to kanji. So to do this, it sends all 2-byte text to the cloud, and they claim the text is sent even when the option is turned off. So, yes, this would seem to be a bug. If the cloud option is off, nothing should be done in the cloud.
However, it does not send standard (single byte) text at all.
Credit card numbers and passwords are always in single byte text, which means that neither the Baidu IME or simeji would have sent them, and the clarification explains just that:
Baidu IME , Simejiでは、全角入力の場合のみ情報が送信されています。クラウド入力Offの場合でも入力文字列を送信していました。パスワードなど半角入力のみの場合は送信されていません。クレジット番号や電話番号も変換しなければ送られません。
The Baidu IME and Simeji only send information when text is entered as two byte text. This happens even when cloud input is turned off. Passwords that are in single byte text are not sent. Credit card and phone numbers are also not sent if they do not require conversion. [Emphasis is original]What this last sentence says is that if you enter the numbers in two byte text, e.g., １２３４, then it will be sent since it is a conversion candidate.
Heres the thing: no one ever enters passwords or credit card number as two byte text, so the cases that they would have been sent are essentially zero. You cannot enter a credit card number (partially for this reason) as two-byte text on any e-commerce site.
The staged password theftGetting to how all this go started, they used a phrase in Japanese that was essentially "1234 is a password," and it was done as １２３４はパスワードです。(Or something like that). The camera then zooms in on a computer monitor that is capturing and displaying the Baidu IME's communication with the cloud server and they show １２３４ being sent. At the time, I was thinking, "who uses 2-byte text for passwords?"
And the answer is:
A security company being broadcast on national TV uses 2-byte text for a password when that is the only way to trigger the reaction they want, even when it's a totally impossible situtation. The whole thing was staged. NHK is usually much better than this.
Original postAccording to NHK [J], the Baidu IME for PCs is sending all keystrokes, plus application and computer information, to outside servers even when the settings are explicitly set to not send information.
It's less clear what is happening with Simeji android IME. I haven't used it in years since adamrocker sold it to Baidu. With Simeji, it could just be that it is set to send and receive data by default, as opposed to sending data always, regardless of preference settings. Either way, I recommend avoiding it for the Google Japanese IME (insert NSA joke here).
This is very different from other IMEsOf course all IMEs have the ability to send data back home. This allows for new words to be added as they become commonly used and for general improvements to input*. The difference here are major. According to the NHK article, both Google and Just Systems (maker of ATOK) send anonymized usage statistics with explicit permission from the user. That is, sending information is opt-in.
Baidu on the other hand does the exact opposite. Data are sent by default. Data are not anonymized. Raw text input is sent. You cannot opt out. If you do opt out, data are sent anyway.
* I'd argue that Swype, while I initially praised it's Japanese input, does actively not collect any information about how Japanese is input. None of these suggestions or bug has been fixed or implemented. I feel like I basically had to teach Japanese grammar to the swype keyboard, but with any complex sentence structure, forget about swyping in Japanese.