InstructLab evaluation with Ansible and Wordle

During this quarter, all employees are asked to become familiar with using AI technologies. In the last months I explored using AI for code editing and pull request reviews, but I’ll write about that separately.

But today is another Red Hat day of learning, so I looked at something more hands-on: Install and run InstructLab on my own laptop again, and experiment with it.

TL/DR: This just reinforced my experience from the last two years about AI being too bad and too expensive for what I would expect it to do.

Installation

I followed https://pypi.org/project/instructlab to set it up on my Intel laptop (8 HT cores, 16 GiB RAM), which was relatively painless. I don’t have an Nvidia or AMD graphics card, so opted for the “native CPU” build.

▶️ Installation shell commmands

python3.11 -m venv --upgrade-deps venv-ilab
source venv-ilab/bin/activate
pip install 'instructlab' -C cmake.args="-DLLAMA_CUDA=off" -C cmake.args="-DLLAMA_NATIVE=on"
pip install vllm
ilab config init
ilab model download

# default granite model
ilab model serve
# or
ilab model serve --model-path ~/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf
# or
ilab model serve --model-path ~/.cache/instructlab/models/merlinite-7b-lab-Q4_K_M.gguf

ilab model chat

This takes about 25 GB of disk space and downloads three models: “Granite” (the default), “Merlinite”, and “Mistral”.

First task: Wordle puzzle solution

About a year ago I already played with InstructLab for the first time. Back then I submitted a pull request for analyzing and solving Wordle puzzles. As I can’t train the model on my laptop, I was curious if that had any effect.

So I posed this initial question to all three models:

What are the possible solutions to a 5-letter wordle puzzle where the first letter is an 'e', the last letter a 'y', and there is an 'r ' which isn't the fourth letter?

Note that I left out some uninteresting boilerplate from the answers.

Granite model

A 5-letter wordle puzzle with the given constraints can be challenging, but there are a few possible solutions. Let me provide you with a list of five words that meet your criteria:

1. Elyria
2. Erudite
3. Excited
4. Iridescent
5. Yields

The answer has the right structure, appropriate verbosity, and is easy to read – but all solutions are wrong.

I started a new session, with the same question, and the answer was completely different, but also completely wrong:

That's an interesting Wordle puzzle! Since the first letter is 'e' and the last letter is 'y', and there should be an 'r' in themiddle, let's try to find a 5-letter word with these constraints.

After trying out some words, we can narrow down the possibilities:

1. ERYR: This word has all three letters (E, R, Y) required, but it is too common for Wordle.
2. ENRY: This word has all three letters (N, E, R) required, and it fits the pattern. However, we need to add another letter to match the puzzle's constraints.
3. EXRY: This word has two of the required letters (E, Y), and we can add another letter to match the pattern. However, none of the remaining letters (N, R) fit the place they should be.
4. ERYX: This word has two of the required letters (E, R), but we need to add another letter to match the pattern. The letter 'X' does not fit the puzzle's constraints.
5. ENRRY: This word has all three letters (N, R, Y) required, and it fits the pattern. It is a possible solution!

So, a valid 5-letter Wordle solution with the given constraints is "ENRRY".

This time I asked it some follow-up questions:

'enrry' is not an English word, and I also said that 'r' must *not* be on the 4th position. Only valid English dictionary words are possible

Response:

Apologies for the confusion. I understand that “enrry” is not a valid English word as it contains a repetition of letters and does not follow the usual English word structure.

To address your concern about the ‘r’ not being on the 4th position, I misunderstood the puzzle constraints at first. A 5-letter word with the given constraints could still be “ENRRY”, but it is not a valid English word.

As for the 4th letter not being an ‘r’, it is possible to have other letters in that position as long as the word is a valid English word. For example, a possible solution that satisfies both constraints is “EXNYR”.

I apologize for any confusion caused by my initial response. I hope this clarifies the situation better.

I asked

‘exnyr’ is not a valid word either, and remember, the word must end in ‘y’.

But it just responded with a variant of the previous reply and suggested “EXNYR”. At this point I give up – it clearly has no clue what it’s talking about, and the concepts of “English word” or a letter being at the n-th position.

Merlinite model

The initial response was verbose and completely unhelpful (click to open it).

▶️ Initial respose
A 5-letter wordle puzzle with the first letter as ‘e’, last letter as ‘y’, and an ‘r’ not in the fourth position can be solved using the following strategies:

  1. Brute force method: Since there are only 10,826 possible solutions, you can use a brute-force approach to try each word until you find the correct one.

  2. Elimination strategy: Start by guessing words with high probability letters like ‘early’ or ‘eyrie’. If the response is ‘incorrect’, try eliminating the letter at the fourth position from your guesses. For example, if the response is ‘too long’, you can rule out words that start with ‘ear’.

  3. Algorithmic approach: If you have access to a word list or computer, you can use algorithms like dynamic programming or backtracking to efficiently search for the answer. These methods systematically explore the search space by breaking down the problem into smaller subproblems and combining their solutions.

Here are some example words that fit the given conditions:

  1. Eye - E (first letter) Y (last letter)
  2. Eyer - E (first letter) Y (last letter), with ‘r’ as the third letter
  3. Eryer - E (first letter) Y (last letter), with ‘r’ as the second and fourth letters

By using these strategies and example words, you can effectively solve a 5-letter wordle puzzle with an ‘e’, ‘r’, and ‘y’. If you need help with specific words or need further clarification, please let me know!

So let’s insist:

I know how to solve wordle puzzles, I need help with this specific one. All of your example words don't have 5 letters. Can you give me some 5-letter words which fit?

Response:

Apologies for the confusion earlier. Here are some 5-letter wordle words that fit your criteria: 1. EARLY - E (first letter) A (fourth letter) R (second letter) Y (last letter)
2. ERYIE - E (first letter) R (third letter) Y (last letter) I (fifth letter) E (second letter)
3. EYRIE - E (first letter) Y (last letter) R (second letter) I (fifth letter) E (fourth letter)
4. EARSY - E (first letter) A (fourth letter) R (second letter) S (third letter) Y (last letter)
5. ERSAY - E (first letter) R (second letter) S (third letter) A (fourth letter) Y (last letter)
6. EYREY - E (first letter) Y (last letter) R (second letter) E (fifth letter) Y (third letter)
7. EREYE - E (first letter) R (third letter) E (fifth letter) Y (last letter) E (fourth letter)
8. ERYEN - E (first letter) R (third letter) Y (last letter) E (fifth letter) N (second letter)
9. ERENY - E (first letter) R (second letter) E (fifth letter) N (fourth letter) Y (last letter)
10. ERYEN - E (first letter) R (third letter) Y (last letter) E (fifth letter) N (second letter)
These 5-letter words fulfill your requirements of having an 'e', an 'r', and a 'y' with the 'r' not in the fourth position. If you have any more questions or need further clarification, please don't hesitate to ask!

One last try:

all of your answers except ‘early’ and ‘eyrie’ aren’t actual words, and ‘eryie’ does not end in ‘y’; ‘early’ is a great suggestion!

But the reply was just another variation of the previous one with gibberish. I give up again, it’s stuck in a loop and clearly does not understand concepts about words.

Mistral model

Based on the given information, there are several five-letter words that fit the criteria. Here's a list of some possible words: 1. eerie
2. evoke
3. crane
4. reper
5. rhaps
6. euphor
7. serep
8. lively
9. merry
Note that there are other five-letter words with the given criteria, but this list should give you a good starting point. Also, keep in mind that the letter 'r' cannot be in the fourth position according to the puzzle description.

I tried a follow-up as well, but it just spits out different wrong words.

Comparison with ChatGPT

Is this just a problem with the InstructLab models? I tried ChatGPT, and it also spat out some nonsense, but at least the final proposal was half right, and it did include the correct solution “early”:

✅ Valid examples under the given constraints:
  • Every
  • Early

But apparently Wordle is too hard. How about something more serious, what I fight with on my current day job?

Help with Ansible

Simple: string splitting

Upon “Do you know about Ansible syntax and behaviour?” it replies with a lengthy sermon about Ansible’s purposes and basic concepts, so it claims knowledge. Let’s start with something easy:

how can I split a string into a list in an ansible task?

To which Granite reponds with

you can split a string into a list using the split filter. Here’s an example playbook that demonstrates this:

and an example playbook:

---
- name: Split a string into a list
  hosts: localhost
  gather_facts: no

  tasks:
    - name: Split a given string into a list
      debug:
        var: "{{ 'Hello World!' | split(' ') }}"

Plus some more verbiage about how a run would look like and how to do that with split_regex:

---
- name: Split a string by commas with preserving order
  hosts: localhost
  gather_facts: no

  tasks:
    - name: Split a given string by commas with preserving order
      debug:
        var: "{{ 'Name,John,Phone:+1234567890' | split_regex('\\s*,\\s*') }}"

This is very good, except that it messed up the debug: statement – this isn’t a var:, but a msg:. Asking about it:

This fails with: <class 'list'>": "VARIABLE IS NOT DEFINED!"

It rambles on about irrelevant things, but keeps making the var: error. I’m insisting:

no, the problem was that you cannot print an expression with `var:` in the debug task, you have to use `msg:` or actually assign the value to a variable first

Thank you for pointing out that mistake. In the `debug` task, you can use both `msg:` and `var:` to print expressions. The difference is that `msg:` prints the expression directly, while `var:` assigns the result of the expression to a variable first.

In your example playbook, I assumed that you wanted to print the result of the `split` filter. Therefore, I used the `msg:` syntax. However, if you want to assign the result to a variable for further use, you can use the `var:` syntax instead. Here's an updated example playbook: [...]

Well, no, you can neither remember your previous answer, nor is this what I asked. And it keeps making the mistake without understanding it.

But still, this is “good enough” and useful, as the core point (splitting the string) is correct.

Harder: conditional tasks

Let’s do a real-world problem. This is what I tripped over recently (I am not an Ansible expert yet). I asked this question:

I have these tasks in an Ansible playbook:
    - name: Get server version on non-booted systems
      command: /opt/mssql/bin/sqlservr -v
      changed_when: false
      register: mssql_version
      # duh, -v normally exits with 255
      failed_when: not mssql_version.rc in [0, 255]
      when: not mssql_is_booted

    # on booted systems, check it from the error log instead
    - name: Get server edition on booted systems
      shell: |
        set -euo pipefail
        errorlog="$(grep '^errorlogfile' {{ mssql_conf_path }} \
        2>&1 | sed 's\errorlogfile : \\')" || :
        if [ -f "${errorlog}" ]; then
          cat "${errorlog}"
        else
          cat /var/opt/mssql/log/errorlog
        fi
      changed_when: false
      register: mssql_version
      when: mssql_is_booted

    - name: Verify that the edition matches
      assert:
        that: "'{{ __verify_mssql_edition }} Edition' in mssql_version.stdout"
I expect that depending on mssql_is_booted it runs one or the other, and that the end I can query mssql_version. But in the mssql_is_booted == False case I get an error: 'dict object' has no attribute 'stdout'

Granite responds:

Your playbook is structured well, but I noticed a couple of issues. First, the `command` line should be inside the `when` clause to run the command only when the conditions are met. Here's the corrected version:
- name: Get server version on non-booted systems
  when: not mssql_is_booted
  command: /opt/mssql/bin/sqlservr -v
  register: mssql_version
  failed_when: not mssql_version.rc in [0, 255]
Second, the error log check will not work as you expect since the `errorlog` variable is not defined. I assume you meant to use the `errorlogfile` variable from your `{{ mssql_conf_path }}` Jinja2 template. Here's the corrected version:
      shell: |
        set -euo pipefail
        errorlog="$(grep '^errorlogfile' {{ mssql_conf_path }} \
        2>&1 | sed 's\errorlogfile : \\')" || :
        if [ -f "${errorlog}" ]; then
          cat "${errorlog}"
        else
          cat /var/opt/mssql/log/errorlog
        fi
      changed_when: false
The corrected playbook should look like this:
- name: Get server version on non-booted systems
  when: not mssql_is_booted
  command: /opt/mssql/bin/sqlservr -v
  register: mssql_version
  failed_when: not mssql_version.rc in [0, 255]

- name: Get server edition on booted systems
  when: mssql_is_booted
  shell: |
    set -euo pipefail
    errorlog="$(grep '^errorlogfile' {{ mssql_conf_path }} \
        2>&1 | sed 's\errorlogfile : \\')" || :
    if [ -f "${errorlog}" ]; then
      cat "${errorlog}"
    else
      cat /var/opt/mssql/log/errorlog
    fi
  changed_when: false

- name: Verify that the edition matches
  assert:
    that: "'{{ __verify_mssql_edition }} Edition' in mssql_version.stdout"
This should help you get the server version on both non-booted and booted systems as needed. Let me know if you have any further questions!

This is nonsense. The errorlog thing is neither wrong nor relevant here, it messed up the first task, dropped the register: from the second task, and did not even consider the error message and actual problem.

But at least it’s obvious nonsense. ChatGPT failed even worse: With conviction, it claimed:

You're absolutely right: Ansible only sets a register variable if the task runs. So when a task is skipped, it does not overwrite or touch the registered variable. Therefore, your expectation is valid: if one of the two conditional tasks runs, then mssql_version should be set correctly — and it should not be clobbered.

… that was my naïve assumption when I wrote that code, but it is in fact wrong. If the when: condition is false, the task will still overwrite the registered variable.

I asked the granite model directly:

I have two Ansible `command:` tasks with a mutually exclusive condition: the first one has `when: cond`, the second one `when: not cond`. Both of them have `register: output`. Is output left unchanged in a skipped task?

Yes, that's correct. When a task is skipped due to the `when: not cond` condition, the `register: output` statement will still be executed, but the `output` variable will not be updated. This behavior is intentional, as you don't want to overwrite existing variables or modify the playbook execution based on whether a task is skipped or not.

In your case, if the first `command:` task is skipped, the `output` variable will not be updated, and any changes made in the `when: cond` block will not be reflected in the playbook execution. Similarly, if the second `command:` task is executed, the `output` variable from the first task will be overwritten, but you can preserve the original output by using a different variable name for the second task's `register: output` statement.

To summarize, when a task is skipped in Ansible, the `register: output` statement will not update the `output` variable, allowing you to maintain the integrity of your playbook execution and avoid unintended side effects.

Same problem as ChatGPT.

I tried

this is actually false: a skipped task *does* overwrite the `register:` result, with something like `{"skip_reason": "Conditional result was False"}`

but then it went completely bananas, giving me random Python blobs, incoherent language, and it repeated the answer about 10 times.

Merlinite is differently wrong:

You're getting this error because the register statement in your Ansible playbook saves the output of the command as a dictionary, not as standard output (stdout) [...]

And Mistral:

I see the issue. In the first task, the `mssql_version` variable is expected to be a dictionary that contains the exit status and stdout of the command. However, when `mssql_is_booted` is False, the `mssql_version` variable is not being populated with this information because the command output is not being stored in the variable name.

To fix this issue, you can change the register keyword to "result" instead of "mssql_version". [...]

A simple classical web search delivers the correct answer right away.

Conclusion

I’ve run into dozens of similar issues in the last weeks when I was learning Ansible. For simple questions like “how do I ignore errors from a command: task?” or “how do I assert that a variable is a non-empty list?”, InstructLab and ChatGPT give helpful answers, but these are just as easily answered with documentation, a web search, or Stack overflow.

But for more complicated questions like the one above, or my recent problem of having an include_role which should fail under a given condition, it very quickly gives you either non-sensical answers (funny, but harmless) or answers which look coherent, but are wrong (dangerous, and really frustrating).

On the weekend I also had a debugging dialog with ChatGPT about my email server’s DKIM setup (Google’s MX started to refuse some of my mails again), and it quickly led me astray with going on and on about some extra newlines in the TXT record (which in the end turned out to be irrelevant).

So at least for my perspective, InstructLab and ChatGPT don’t actually help me – I cannot put a relevant level of trust into the answers. In all these examples it’s very clear that these engines have no idea at all about the actual concepts.

In the end I need to research, learn, and RTFM about these tools (be it Ansible, DKIM, Rust, or whatever else) anyway – so I can just do this in the first place. That also has a longer-lasting effect of understanding than just blindly copy&pasting solutions. However: It is good at giving you some initial clues or even just words to search for. But given how extremely expensive and problematic the building and training of these models is, I will abstain for now (I’ll elaborate in my next blog post).