Echo State Networks (ESNs) are an alternative to fully trained Recurrent Neural Networks (RNNs) showing State of the Art performance when applied to time series prediction. However, they have seldom been applied to abstract tasks and in the case of language modeling they require a number of units far superior to traditional RNNs in order to achieve similar performance. In this paper we propose a novel architecture by extending a conventional Echo State Network with a pre-recurrent feature layer and a nonlinear readout. The features are learned in a supervised way using a computationally cheap version of gradient descent and automatically capture grammatical similarity between words. They modifiy the dynamic of the network in a way that allows it to significantly outperform an ESN alone. The addition of a nonlinear readout is also investigated making the global system similar to a feed forward network with a memory layer.