Date : July 17, 2024

Company : Intellectus Corp.

AARON LIM, Software Designer Computer Science BS @ Purdue University - West Lafayette


Introduction

If you had a question, would you use Google or ChatGPT? Well, it would depend on the question. If it was a straightforward question like the definition of a word, you could use a keyword search engine like Google. However, if a question requires comprehension, you would probably get better results using large language models like ChatGPT. But GPT isn’t always accurate, and Google isn’t always rational. Each method has advantages and disadvantages, so what if there was a way to combine the two, and get the best of both worlds?

Large language models (LLM) like ChatGPT have influenced many current-day technologies. Everyone has been searching for ways to incorporate LLMs into anything and everything. Yet, since LLMs are still a recent development, the full extent of their potential has not been discovered. On the contrary, Knowledge graphs (KG) are a more dated technology but are still quite useful, as evidenced by how KGs are still being used today. KGs comprise the backbone of search engines like Google and are crucial for organizing information and retrieving data. At first glance, both LLMs and KG query algorithms have many similarities. However, upon closer inspection, both technologies lack in an aspect where the other excels. This has sparked a recent interest in how to combine the two.

LLMs can take a prompt, and reply to it accordingly. This is useful in various areas such as automating natural language discussions (chatbots). Meanwhile, KGs are a way to structure relational data. After constructing a KG, the KG can be queried for a relational fact, and the answer will be provided accordingly. The similarity between these two technologies comes from the idea that people can prompt/query either technology to get an answer to a problem. However, their different implementations result in different properties, and it turns out that one's weakness is the other's strength. KGs have existed for a while, but have a few things that could be improved like a small range of information and costly construction, which could be potentially improved by utilizing LLMs. On the other hand, LLMs sometimes struggle to be accurate and transparent, which KGs could help with. This article will provide a summary and examination of the current trends in combining LLMs and KGs.


Large Language Models (LLM) Basics

An LLM is a machine intelligence algorithm. The keyword when trying to understand LLMs is prediction. The purpose of an LLM is to predict what word fits best within the provided context. Chatbots like ChatGPT use this predictive algorithm, but LLMs can also be used in other situations. Anytime something is written in natural language, an LLM can predict what words will fit best. LLMs are trained on a large corpus of text (thus the name "large language model"). The algorithm learns from human languages and then correlates what words fit best with other words. They can also know what kind of answer fits best with what kind of questions. When you query a chatbot, the LLM will respond with an answer that they deem most appropriate to your question, by using a combination of similar answers to similar questions made by humans. In other words, they predict what answer would fit best.

LLMs are very popular because of their ability to understand human, natural language seemingly. This is being utilized in a wide variety of fields, for example in interfaces, where human interactions can be replaced with a machine that can mimic human interactions. However, LLMs are also effective at human-to-machine interactions as well, like translating natural language to coding languages, or data formats to human speech.

One negative of LLMs is their lack of transparency; LLMs are considered a black box algorithm in that we know the input and output, but cannot understand the internal algorithm completely. LLMs use a neural network architecture called a transformer in their algorithm. Transformers come in mainly two types: encoders or decoders (or sometimes an encoder-decoder hybrid). Transformers “understand” natural language by translating words into multi-dimensional numerical representations called embeddings. Each word has a certain embedding, but an average human would not be able to understand the correlation.  Transformers also use a mechanism called self-attention, which means that they determine which parts of the input are more important than the others. Although we can find which words transformers find more important, the transformer uses a neural network to come to this conclusion, which means that we cannot know exactly why the algorithm gives attention to certain words. Transformers make it hard to understand exactly why and how an LLM outputs the way it does.


Knowledge Graphs (KG) Basics

KGs are a type of data structure. Unstructured data like text is hard to query and analyze. KGs are the perfect solution to this problem as they can translate and store not only keywords but the meaning of the data as well. Knowledge graphs are used to store relational data, and there is a lot of relational data on the internet. A KG is composed of entity nodes which store information like names and objects and relation edges which store the relationship between two entities. Having a web of these nodes and edges results in a graph. Usually, this information is stored in something called a semantic triple, or RDF triple: "{head, relation, tail}" where "head" is the first entity, "tail" is the second entity, and "relation" is how the head relates to the second. This type of data structure can be formally queried using query languages such as SPARQL or Cypher. With KGs, previously unstructured data like long pieces of text can be structured into a data form that can be formally queried.

Due to this design, knowledge graphs are very flexible in the data they can store. For example, even a simple data table can be transformed into a knowledge graph. Say you have a table with customer's names and their information like email. You could store their name and email as entities and link the entities with a relation. Another example is storing classes. If you are working with something like large databases, you might want to use a KG to describe the relations between different classes/subclasses. KGs are very versatile at storing all kinds of information, which is a large benefit in this data-driven era.

Knowledge Graph Construction Process. Generated by Processon. (Fang, 2024)

One drawback to KGs is their tedious construction process. First, entities, relations, and attributes must be extracted from the data. Then, entities that have duplicates need to be merged and resolved. The process does not end there; as more information is added, the KG will continue to grow and more data will have to be resolved. Previously, these processes were done manually by people.

Another drawback is that KGs need help understanding natural languages. Not everyone who uses KGs may know query languages, so it would be beneficial to query the KG using natural language. Algorithms that convert natural language questions to answers using KGs are called Knowledge Graph Question Answering (KGQA). Previously, semantic keyword searches were implemented to search KGs in natural language. Search engines use this method of KGQA; they take your prompt, identify keywords, find synonyms, and then search the knowledge graph for the keywords. However, with the rise of LLMs, search engines and other KG users are trying to implement LLMs into this search method.


Advantages and Disadvantages Alone

LLM

  • (+) generative AI
  • (+) great at processing natural language
  • (-) lack of transparency (black box)
  • (-) inaccurate on niche topics
  • (-) hallucinates false information

KG

  • (+) transparent
  • (+) accurate
  • (-) out-of-vocabulary instances
  • (-) cannot understand natural language alone

Diagram of KG queries (left) and LLMs (right). (Yang et al., 2023)

LLMs have been praised for their ability to understand and reply to natural language in an almost humanlike manner. Although LLMs are great at processing text and generating appropriate results, they are not always the most accurate. Notably, LLMs can hallucinate, or make up facts. LLMs answer prompts by determining what word fits best within the context. Because of this, it may hallucinate false facts, especially if responding to a question on a less popular topic. LLMs are effective at answering general knowledge facts, but sometimes struggle to answer less common facts. Since LLMs are trained on a large internet corpus, they get common facts mostly right but fail to answer niche topics correctly. Additionally, because of how they are designed, LLMs lack transparency (black box algorithm): you cannot tell which sources the LLM used to answer your question. For example, when you ask an LLM a question, and then ask it where the information came from, more often than not the model will give a general answer rather than a specific source. Simply, LLMs take the internet corpus as data, and average the text of many different people to create an answer that "makes sense." However, this design makes LLMs not always accurate or transparent.

On the other hand, KGs are better at answering queries regarding specific information accurately, as long as the requested information is in the knowledge graph. KGs cannot hallucinate answers because their knowledge is directly based on the data it came from, so as long as the data that makes up the KG is correct, the KG is correct. It is also a transparent process, as KG query algorithms search the KG for the specific node/relation that answers the query. However, KGs fall victim to out-of-vocabulary (OOV) instances. KGs may not be able to understand some queries because some words are not recognized or the graph does not contain the requested information. KGs are more accurate and specific while LLMs are less accurate and have a wide expertise.


Working Together

A graph of the number of arXiv publications with the keywords “knowledge graph large language model” 

from 2015 to the current day. (arXiv Trends Tool)

Much research is being done on combining KGs and LLMs because of how well they synergize. A quick arXiv Trends search reveals that papers that contain the keywords “large language models” and “knowledge graph” have only increased since 2022 (the year of the release of ChatGPT). Currently, there seem to be 3 main ways people are combining these two technologies:

  • KG enhanced LLMs
  • LLM enhanced KG Queries
  • Using LLMs to construct KGs


KG Enhanced LLMs

A diagram of a typical RAG. (AWS)

One way KGs can be implemented into LLMs is retrieval-augmented generation (RAG). RAG is a method of optimizing LLMs by backing them directly with selected data. RAG was introduced to mitigate the inaccuracy that came from hallucinations. Essentially, RAG uses an LLM to communicate with an external data source to verify the LLM's answer. Since KGs are a type of data structure, they can be used to ground the LLM model, creating RAG or, in this case, a Graph retrieval-augmented generation (GRAG). GRAG is a form of RAG that uses a graph as its knowledge source instead of a different data structure.

Another benefit to using RAG is that, compared to normal LLMs, RAG is transparent because it can directly point to its source. By using RAG, we benefit from communicating with a seemingly rational agent able to understand natural language, which also refers to concrete information to answer questions. As a metaphor, if an LLM was a person, RAG would be that same person equipped with a book on a specific topic, able to refer to and source their claims.

When GRAG is asked a question, it can directly point to the relationship triple(s) it used. Therefore, if prompted, the GRAG can explain where it got its answer from, solving LLM's transparency problem. This method of combining LLMs and KGs results in a system that can reply to natural languages rationally and accurately, with transparency.


LLM Enhanced KG Queries

This method focuses on the opposite: how can LLMs enhance KGs? Before the recent interest in LLMs, KGs were queried by different means such as semantic searches. Now, LLMs can benefit KGQAs by solving out-of-vocabulary problems, streamlining the query itself, and introducing common sense. There are multiple ways LLMs can be implemented into the KGQA process.

As mentioned, LLMs are great at interpreting natural language text. Since KGQA requires understanding natural language questions, LLMs can be used before KG queries to decipher the natural language query and translate it to a KG query language. One blunt way to do this is to ask LLM chatbots like ChatGPT to translate natural language to a graph query language. However, there are also nuanced implementations that add to this method.

API Search that combines LLM AI chains and knowledge graphs. (Huang et al., 2023)

Another way LLMs are being used is for query clarification. In some scenarios, it is appropriate for the KGQA to ask for clarification. For example, in a paper regarding KGQA for API searches (Huang et al, 2023), they devised a clarification system that made use of LLMs multiple times. So instead of just asking a query and receiving an answer, the user would have to ask a query, provide clarification (as much as they can), and then receive an answer. The LLM in this system was used to think of potential clarifying questions that could be beneficial to find the user's specific needs. This is a novel way of querying, as the process requires a back-and-forth between the user and the system so that the user's specific needs are met.

A side benefit of including LLMs in KGQAs is the ability to chain LLMs for verification and better accuracy. There are several instances of there being a chain of LLMs with different purposes to increase accuracy. In the API search algorithm mentioned previously, the researchers used 5 models to do these 5 different tasks:

  • generating the what aspect of the prompt should be questioned
  • generating clarifying questions
  • generating alternative questions
  • extending the query
  • recommending the API

The output of one LLM would become the input of another, leading to an "AI chain." The process also included iteration allowing users to clarify their query even further. Experts have recommended including LLM chains and iteration to both LLMs and KGQAs as there is evidence that these techniques increase the accuracy of the answer.

UniOQA Framework Diagram. (Li et al., 2024)

Another example of LLM-enhanced KGQA: Xidian University researchers encourage parallel use of LLMs and KGQA (Li et al., 2024). This model uses two different ways to convert natural language into a cohesive answer. One is called the translator, the other the searcher. The translator takes the query prompt and makes the LLM translate it to CQL. Then, Entity and Relation Replacement matches the closest entities within the knowledge graph and finds an answer. The searcher does the opposite: it matches the prompt's keywords with the KG's entities and relations first, then extracts the related triples, then uses RAG to come up with an answer. These two methods are then compared using a ranking algorithm and the better answer is outputted. By having a KGQA and LLM run in parallel, they can increase accuracy.


Using LLMs to construct KGs

Another way that LLMs can be integrated into KG systems is to use LLMs to automate and aid in KG construction. KG construction is a costly process. There are many steps to convert different data types to a KG, not to mention the size of some KGs. With LLMs, this conversion can be automated instead of manually converted.

Some frameworks use LLMs to extract entities and relations while others directly ask LLMs to extract triples. A common use of knowledge graphs is to structure pieces of text. So, for example, you could prompt an LLM to summarize a piece of text into semantic triples. The LLM directly creates a knowledge graph by turning text into these triples. In several papers, this method of creating KGs has proven to be quite efficient, especially if prompt-tuned. However, it can still fall victim to hallucinations.

Service domain knowledge extraction framework of BEAR. (Yu, Huang, Liu, & Wang, 2023)

One example of a KG construction framework that uses LLMs is BEAR. BEAR takes text data and makes an LLM (like GPT) extract knowledge like entities and relations. The picture above shows the process. Knowledge and ontology are extracted and fed to GPT, which creates the knowledge graph. The KG can then be queried.


Conclusion

KGs and LLMs have a lot of synergy, so it is obvious why programmers are researching ways to combine the two. LLMs bring natural language processing and rationality while KGs bring grounded knowledge and transparency. Working together, they make technologies that benefit from all these traits.

Each of the three methods I have mentioned in this article has a place in the modern day. This comes from the fact that KGs are still a powerful tool to capture and structure certain information that other data structures struggle with. So as long as humans need to extract information from a large dataset, KGs will stay. KG-enhanced LLMs are a natural progression for LLMs as the only thing better than artificial intelligence is informed artificial intelligence. However, this doesn’t mean KGs will stop being used either. Recognizing the importance of combining these two technologies, software companies are finding ways to incorporate either KG-enhanced LLMs or LLM-enhanced KGs into modern search engines. And since KGs are still an important data structure, they will continue to be created, probably with the help of LLMs.

Even though these methods show much promise, it is important to recognize possible flaws, even within these conjoined methods. For example, although LLM-enhanced KGs promise better accuracy, LLMs can still hallucinate within the proposed structures. The studies conducted prove that a combination of LLMs and KGs is better, but doesn’t automatically imply they are perfect. It is crucial to be wary of possible weaknesses and advance accordingly.

As for the future of these technologies, there is still a lot of promise. Although this article has observed certain methods, it has not encapsulated all the methods that are being researched currently. For example, researchers are finding ways to use KGs to train and fine-tune LLMs or even use LLMs to analyze KGs. There are still many applications and more to explore, which is only good news: there is only space to improve.

As their collaboration is further researched, technologies that combine KGs and LLMs will only improve. This article has gone over a few examples of how LLMs and KGs are being combined but there are many more ways this is being done. LLMs are changing and will drastically change a lot of fields; KGs have been revolutionary in how we retrieve and store data. Together, they can reshape how knowledge is used.


References

Amazon Web Services. (n.d.). What are transformers? - Transformers in artificial intelligence explained - aws. Amazon Web Services, Inc. Retrieved July 8, 2024, from https://aws.amazon.com/what-is/transformers-in-artificial-intelligence/

Amazon Web Services. What is rag? - Retrieval-augmented generation explained - aws. (n.d.). Amazon Web Services, Inc. Retrieved June 21, 2024, from https://aws.amazon.com/what-is/retrieval-augmented-generation/

Edge, D., Trinh, H., Cheng, N., Bradley, J., Chao, A., Mody, A., Truitt, S., & Larson, J. (2024). From local to global: A graph rag approach to query-focused summarization (arXiv:2404.16130). arXiv. https://doi.org/10.48550/arXiv.2404.16130

Fang, Q. (2024, April 3). Unlocking intelligence: The journey from data to knowledge graph. Medium. https://medium.com/@researchgraph/unlocking-intelligence-the-journey-from-data-to-knowledge-graph-4d7a08e5f4e0

Guo, W., Toroghi, A., & Sanner, S. (2024). Cr-lt-kgqa: A knowledge graph question answering dataset requiring commonsense reasoning and long-tail knowledge (arXiv:2403.01395). arXiv. https://doi.org/10.48550/arXiv.2403.01395

Huang, Q., Wan, Z., Xing, Z., Wang, C., Chen, J., Xu, X., & Lu, Q. (2023). Let’s chat to find the apis: Connecting human, llm and knowledge graph through ai chain (arXiv:2309.16134). arXiv. https://doi.org/10.48550/arXiv.2309.16134

Kau, A. (2024a, April 3). Automated knowledge graph construction with large language models. Medium. https://medium.com/@researchgraph/automated-knowledge-graph-construction-with-large-language-models-150512d1bc22

Kau, A. (2024b, May 13). Automated knowledge graph construction with large language models—Part 2. Medium. https://medium.com/@researchgraph/automated-knowledge-graph-construction-with-large-language-models-part-2-b107ca8ec5ea

Li, Z., Deng, L., Liu, H., Liu, Q., & Du, J. (2024). Unioqa: A unified framework for knowledge graph question answering with large language models (arXiv:2406.02110). arXiv. https://doi.org/10.48550/arXiv.2406.02110

Liang, S., Stockinger, K., de Farias, T. M., Anisimova, M., & Gil, M. (2021). Querying knowledge graphs in natural language. Journal of Big Data, 8(1), 3. https://doi.org/10.1186/s40537-020-00383-w

Liu, L., Wang, Z., Qiu, R., Ban, Y., Chan, E., Song, Y., He, J., & Tong, H. (2024). Logic query of thoughts: Guiding large language models to answer complex logic queries with knowledge graphs (arXiv:2404.04264). arXiv. https://doi.org/10.48550/arXiv.2404.04264

Meyer, L.-P., Stadler, C., Frey, J., Radtke, N., Junghanns, K., Meissner, R., Dziwis, G., Bulert, K., & Martin, M. (2024). Llm-assisted knowledge graph engineering: Experiments with chatgpt (pp. 103–115). http://arxiv.org/abs/2307.06917

Pan, S., Luo, L., Wang, Y., Chen, C., Wang, J., & Wu, X. (2024). Unifying large language models and knowledge graphs: A roadmap. IEEE Transactions on Knowledge and Data Engineering, 36(7), 3580–3599. https://doi.org/10.1109/TKDE.2024.3352100

Toroghi, A., Guo, W., Pour, M. M. A., & Sanner, S. (2024). Right for right reasons: Large language models for verifiable commonsense knowledge graph question answering (arXiv:2403.01390; Version 1). arXiv. https://doi.org/10.48550/arXiv.2403.01390

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2023). Chain-of-thought prompting elicits reasoning in large language models (arXiv:2201.11903). arXiv. https://doi.org/10.48550/arXiv.2201.11903

Wu, J. J. (2024). Large language models should ask clarifying questions to increase confidence in generated code (arXiv:2308.13507). arXiv. https://doi.org/10.48550/arXiv.2308.13507

Yang, L., Chen, H., Li, Z., Ding, X., & Wu, X. (2024). Give us the facts: Enhancing large language models with knowledge graphs for fact-aware language modeling (arXiv:2306.11489). arXiv. https://doi.org/10.48550/arXiv.2306.11489

Ye, H., Zhang, N., Chen, H., & Chen, H. (2023). Generative knowledge graph construction: A review (arXiv:2210.12714). arXiv. https://doi.org/10.48550/arXiv.2210.12714

Yu, S., Huang, T., Liu, M., & Wang, Z. (2023). Bear: Revolutionizing service domain knowledge graph construction with llm. In F. Monti, S. Rinderle-Ma, A. Ruiz Cortés, Z. Zheng, & M. Mecella (Eds.), Service-Oriented Computing (pp. 339–346). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-48421-6_23

Zhang, B., & Soh, H. (2024). Extract, define, canonicalize: An llm-based framework for knowledge graph construction (arXiv:2404.03868). arXiv. https://doi.org/10.48550/arXiv.2404.03868