A few days in the past, I used to be fascinated about what you wanted to know to make use of ChatGPT (or Bing/Sydney, or any comparable service). It’s straightforward to ask it questions, however everyone knows that these massive language fashions continuously generate false solutions. Which raises the query: If I ask ChatGPT one thing, how a lot do I have to know to find out whether or not the reply is appropriate?
So I did a fast experiment. As a brief programming venture, a variety of years in the past I made an inventory of all the prime numbers lower than 100 million. I used this record to create a 16-digit quantity that was the product of two 8-digit primes (99999787 instances 99999821 is 9999960800038127). I then requested ChatGPT whether or not this quantity was prime, and the way it decided whether or not the quantity was prime.
Learn quicker. Dig deeper. See farther.
ChatGPT appropriately answered that this quantity was not prime. This is considerably stunning as a result of, for those who’ve learn a lot about ChatGPT, you realize that math isn’t one in all its robust factors. (There’s in all probability an enormous record of prime numbers someplace in its coaching set.) However, its reasoning was incorrect–and that’s much more fascinating. ChatGPT gave me a bunch of Python code that carried out the Miller-Rabin primality take a look at, and stated that my quantity was divisible by 29. The code as given had a few primary syntactic errors–however that wasn’t the solely drawback. First, 9999960800038127 isn’t divisible by 29 (I’ll allow you to show this to your self). After fixing the apparent errors, the Python code appeared like an accurate implementation of Miller-Rabin–however the quantity that Miller-Rabin outputs isn’t an element, it’s a “witness” that attests to the reality the quantity you’re testing isn’t prime. The quantity it outputs additionally isn’t 29. So ChatGPT didn’t really run the program; not stunning, many commentators have famous that ChatGPT doesn’t run the code that it writes. It additionally misunderstood what the algorithm does and what its output means, and that’s a extra critical error.
I then requested it to rethink the rationale for its earlier reply, and received a really well mannered apology for being incorrect, along with a unique Python program. This program was appropriate from the begin. It was a brute-force primality take a look at that attempted every integer (each odd and even!) smaller than the sq. root of the quantity underneath take a look at. Neither elegant nor performant, however appropriate. But once more, as a result of ChatGPT doesn’t really run the program, it gave me a brand new record of “prime factors”–none of which have been appropriate. Interestingly, it included its anticipated (and incorrect) output in the code:
n = 9999960800038127
components = factorize(n)
print(components) # prints [193, 518401, 3215031751]
I’m not claiming that ChatGPT is ineffective–far from it. It’s good at suggesting methods to resolve an issue, and might lead you to the proper answer, whether or not or not it provides you an accurate reply. Miller-Rabin is fascinating; I knew it existed, however wouldn’t have bothered to look it up if I wasn’t prompted. (That’s a pleasant irony: I used to be successfully prompted by ChatGPT.)
Getting again to the unique query: ChatGPT is nice at offering “answers” to questions, but when you should know that a solution is appropriate, you have to both be able to fixing the drawback your self, or doing the analysis you’d want to resolve that drawback. That’s in all probability a win, however it’s a must to be cautious. Don’t put ChatGPT in conditions the place correctness is a matter except you’re prepared and capable of do the exhausting work your self.