The task of node classification concerns a network where nodes are associated with labels, but labels are known only for some of the nodes. The task consists of inferring the unknown labels given the known node labels, the structure of the network, and other known node attributes. Common node classification approaches are based on the assumption that adjacent nodes have similar attributes and, therefore, that a node’s label can be predicted from the labels of its neighbors. While such an assumption is often valid (e.g., for political affiliation in social networks), it may not hold in some cases. In fact, nodes that share the same label may be adjacent but differ in their attributes, or may not be adjacent but have similar attributes. In this work, we present JANE (Jointly using Attributes and Node Embeddings), a novel and principled approach to node classification that flexibly adapts to a range of settings wherein unknown labels may be predicted from known labels of adjacent nodes in the network, other node attributes, or both. Our experiments on synthetic data highlight the limitations of benchmark algorithms and the versatility of JANE. Further, our experiments on seven real datasets of sizes ranging from 2.5K to 1.5M nodes and edge homophily ranging from 0.86 to 0.29 show that JANE scales well to large networks while also demonstrating an up to 20% improvement in accuracy compared to strong baseline algorithms.