Emerging Technology

Stumbling towards AGI (Artificial General Intelligence)

Elegant Architecture overcomes limited and messy implementation?

A new article discusses Hugging GPT which uses Chat GPT as a human interface and executive controlling module to control tasks to complete a goal. The tasks are delegated to specialist AI models that perform narrow functions well. The video discusses the ideas and is a great introduction to the paper.

Better Search: Will ChatGPT (or similar) displace Google?

Google has become indispensable in our work and personal lives. Finding products and services, checking out reviews, finding the cheapest supplier and doing professional research. Google has built a $150Bn advertising revenue business on top of that ubiquity.

The ChatGPT large language model from OpenAI burst on the scene recently, attracting over a million users in a week. It boasts a conversational interface accepting complex queries in natural language, a human language response and allows to refine our search, seek more detail or pursue other aspects easily. This style of interaction is extremely attractive - it’s a bit like having a hugely knowledgable human expert on tap to instantly understand our questions and answer in an accessible paragraph or two. It raises the question “Is this the future of search”? Fuel has been added to the fire of this debate with the investment by Microsoft of a further $10Bn in OpenAI. Remember that Microsoft has long promoted Bing in competition with Google search.

But not so fast… The results from ChatGPT are not always accurate. It is based upon a predictive model which has ingested huge amounts of data from the Internet and document sources. Because it is a mathematical model based on probability, it will favour average and mainstream opinions from its training set. It can be prompted to produce factually incorrect answers which are stated very convincingly as facts. Annoying if the recipient already knows the facts, but dangerous or misleading if the recipient does not. The model is also trained on this corpus of data at a given time, in a “batch” mode. So it may not reflect information recently published, or which has been updated since the last training cycle.

OpenAI and others wanting to promote these kinds of systems for search will have to find ways to improve accuracy and currency of the underlying models and provide caveats to users about potential bias and inaccuracy. Meanwhile, Google, which itself has significant AI systems and probably the best, biggest data sources to train them on, can easily add a conversational interface.

To date, ChatGPT has been offered for free use (to gain experience, publicise capabilities and refine the models), but this is likely to change very soon. OpenAI does not yet have in place an advertising supported model like Google and is likely to first try subscriptions. But when it is no longer free, other competitors will spring up.

One smaller but interesting player is looking to offer the best of both worlds, starting now. This is Andi (andisearch.com). Andi search lets you use GPT style prompts and provides a summary answer (much like ChatGPT), but also provides references and search results on the right to allow validation or further exploration. This is very promising! It should be an exciting time in search this year.

Just one API?

GraphQL provides single query for queries and updates

Jargon buster at the bottom of post.

First came RPC to call a function across a network. But it was language specific and lacked standard facilities. So DCE was made to address common requirements, such as directory services, time, authentication and remote files. But it was not object oriented when Smalltalk, C++, Java et al arrived. So Microsoft devised DCOM to provide distributed services for Ms languages while others backed CORBA which provided cross platform and cross language services. Both required agreement for message formats ahead of time.

Enter Web Services, leveraging XML to serialise data, WSDL to describe services, UDDI to publish, find and bind to them, and SOAP to message remote objects. Great! We could now find, bind to and invoke services without prior design agreement. But, it was not very efficient and required a lot of plumbing on each end, and quite a bit of knowledge from developers.

So, Roy Fielding devised REST exploiting HTTP to provide a simple way of working with remote Resources. REST allows us to simply access remote servers and retrieve something GET, inform about something POST, store something PUT, update something PATCH or delete something DELETE. This is achieved by creating simple headers and a request line including the URL and parameters. Post also has a body.

REST is very light weight and does not need much infrastructure. Combining it with JSON made it very easy to use from within web pages and mobile applications and it quickly took off.

But there was a problem. Each REST request would get a specific thing from the server. If there is a rich database or knowledge graph on the server, we can create many REST APIs: At least one for each kind of domain object (e.g Customer, Product, Account, Invoice etc. ); Often more than one to cater for different application requirements (partial records, related records etc. ). Plus we will have different APIs to query, to store, to update etc. So, a server with a database managing a score of domain concepts could quickly require 100s of APIs. Ew, that’s a lot of development, testing, deployment, documentation, maintenance…

Facebook ran into this problem at scale. Their solution was a query language that would live in the server as a single entry point and receive a query request as a parameter. This is not dissimilar to the way a relational database receives dynamic SQL requests. Now the tailoring of a response can happen in the server (more efficient) and we have only one API endpoint to maintain. Voila. So that solved the problem for Facebook… Fortunately, they published it as GraphQL which allows writing query and update (mutate) statements and having these fulfilled by a suitable GraphQL processor / application / database on the server. Initially, these were discrete, but they are starting to be embedded in database systems, especially Graph Databases. One good example is DGraph.

JARGON BUSTER:

You can also find good explanations of most of these topics on Wikipedia

  • RPC - Remote Procedure Calls

  • DCE - Distributed Computing Environment

  • API - Application Programming Interface. A way of requesting a service or function contained in another piece of software. Most commonly used today to refer to a REST API

  • COM+ - Microsoft Component Object Model. An architecture that allowed sharing of objects between Microsoft languages.

  • DCOM - Microsoft distributed COM. Similar to COM+, but allowing objects to be remote

  • .Net - Microsoft Component model and framework that succeeded COM and DCOM

  • CORBA - Common Object Request Broker Architecture. An architecture for distributed object messaging across languages and technologies.

  • Web Services - A set of standards, leveraging XML, that allows requesting services across the Internet. Includes WSDL, UDDI, SOAP.

  • XML - eXtensible Markup Language. A standard for encoding data onto text with specific tagging of the meaning of the values.

  • WSDL - Web Services Description Language. An XML document describing a Web Service.

  • SOAP - Simple Object Access Protocol. A way to invoke a (remote) service in the Web Services approach. Effectively an XML message requesting a given service and expecting an XML response message.

  • UDDI - Universal Description Discovery and Integration. A protocol for publishing Web Service Descriptions and for finding these.

  • HTTP - Hypertext Transport Protocol. The protocol of the internet which allows hyper-linking.

  • REST - Resource State Transfer protocol. A protocol that leverages the HTTP intrinsic functions to support requesting services across the Internet with minimal other infrastructure.

  • JSON - Javascript Object Notation. A way of encoding JavaScript data structures on to text for transmission or sharing. Similar purpose to XML, but lighter weight.

  • GraphQL - A query language used on a client and interpreted in a server which allows easy retrieval of data using graph concepts (Nodes, properties and relationships).

  • RDF - Resource Description Framework. A standard for defining facts and knowledge using simple statements with a Subject, Predicate, Object format. Part of Semantic Web standards.

  • DGraph - a Property Graph database that supports graph schemas, RDF, JSON and GraphQL natively at web scale. Also does ACID transactions.

  • ACID - Atomic, Consistent, Isolated, Durable. Desirable attributes of transactions in a database.

#API #Services #REST #WebServices #SolutionArchitecture #Design #GraphQL