We know vim puts the terminal in “raw” mode where it receives keystrokes as they are typed, opposed to “cooked” mode where the command is not processed fully unless the end-user enters it in the terminal.

How does the shell distinguish when to go into either mode? How does this switch happen? is there a mode in between “raw” and “cooked” mode?

To clarify, any process that has access to a terminal can change that terminal's settings, simply by calling tcsetattr() with the appropriate attributes (the same call used by the termstate_ functions in cush).

vim is an example of a process that does that. Raw mode is also entered by the readline() function cush (and bash) uses. That's why, for instance, Ctrl-A and Ctrl-E work and many other readline shortcuts. When readline() returns, the terminal is set back into whatever state it was in before the call, so we don't notice. If we had implemented our shell with, say, scanf() and printf() only, we wouldn't have put the terminal into the raw state, so a shell could be implemented without raw mode, albeit with less user comfort.

As to what "raw" mode is and how to enter it. It turns out that "raw" vs "cooked" mode isn't actually the official term (anymore). The terms come from Unix System 7. In POSIX, what's commonly called "raw" mode is a combination of switches. tcsetattr(3) describes it as:

Raw mode

cfmakeraw() sets the terminal to something like the "raw" mode of the old Version 7 terminal driver: input is available character by character, echoing is disabled, and all special processing of terminal input and output characters is disabled.  The terminal attributes are set as follows:

termios_p->c_iflag &= ~(IGNBRK | BRKINT | PARMRK | ISTRIP
                | INLCR | IGNCR | ICRNL | IXON);
termios_p->c_oflag &= ~OPOST;
termios_p->c_lflag &= ~(ECHO | ECHONL | ICANON | ISIG | IEXTEN);
termios_p->c_cflag &= ~(CSIZE | PARENB);
termios_p->c_cflag |= CS8;

You can look up what all these attributes mean, but the key attribute here is ICANON and that's how POSIX refers to line-by-line vs key-by-key processing mode, as "canonical" (line-by-line) and "non-canonical" mode.

If you want to try out raw mode yourself, here's a short program:

// raw.c
#include <stdio.h>
#include <termios.h>
#include <unistd.h>
#include <fcntl.h>
#include <assert.h>

int
main()
{
    int terminal_fd = open(ctermid(NULL), O_RDWR);
    assert (terminal_fd != -1);

    struct termios tty_state;
    int rc = tcgetattr(terminal_fd, &tty_state);
    struct termios saved_tty_state = tty_state;

    assert (rc == 0);
    tty_state.c_iflag &= ~(IGNBRK | BRKINT | PARMRK | ISTRIP
                    | INLCR | IGNCR | ICRNL | IXON);
    tty_state.c_oflag &= ~OPOST;
    tty_state.c_lflag &= ~(ECHO | ECHONL | ICANON | ISIG | IEXTEN);
    tty_state.c_cflag &= ~(CSIZE | PARENB);
    tty_state.c_cflag |= CS8;

    printf("press ctrl-d to exit\n");
    rc = tcsetattr(terminal_fd, TCSANOW, &tty_state);
    assert (rc == 0);

    char c; 
    while (read(0, &c, 1) == 1 && c != 0x4)
        write(1, &c, 1);

    // restore sane state on exit
    rc = tcsetattr(terminal_fd, TCSANOW, &saved_tty_state);
    assert (rc == 0);
}

Compile with gcc -o raw raw.c and then you can start it with ./raw. Everything input is echoed back. Ctrl-C and Ctrl-Z don't work anymore. If you type Enter it goes to the beginning of the line (type Ctrl-J to go to the next line). Type Ctrl-D (0x4) to exit.