koala-moon

Tag: regex

  • Replacing ObjectId with a string in JSON. Using RegEx

    Problem: I have a data dump of a MongoDb query in a JSON file. I need to replace the ObjectID(“12345677abc”) with “12345677abc”.

    Using Visual Studio Code’s find and replace

    Find:

    ObjectId\("([0-9a-fA-F]{24})"\)
    

    Replace with: “$1”

    Turns this

    "_id" : ObjectId("5e3b1890e032d225a091d43f"),
    "userId" : ObjectId("65ed1c2c-922c-4c82-b5bc-7324f69eea10"),

    To this

    "_id" : "5e3b1890e032d225a091d43f",
    "userId" : "65ed1c2c-922c-4c82-b5bc-7324f69eea10",


    Bonus:

    ISODate\("([^"]+)"\)
  • REGEX Examples with explanations

    A string 24 characters in length and only with hexadecimal letters and numbers

    Example use: In MongoDb the unique id’s (ObjectId)

    The string must be 24 characters long. ^ Starts of string or start of line, $ end of string or line; both dependent on the “multi-line mode”

    /^{24}$/

    Hexadecimal uses only the letters “a to f” and the number “0 to 9”

    /^[a-f0-9]{24}$/

  • REGEX 2

    Okay in the previous post we had found the “find” tool and realised we can do so much more with it using regular expressions (regEx).

    To recap a regEx is

    “a sequence of symbols and characters expressing a string or pattern to be searched for within a longer piece of text.”

    \d = a character 0 to 9

    \w = any character a to Z and 0 to 9

    \s =  whitespace

    Example:

    \d\d\d will (using the find tool in your text editor) will highlight groups of 3 numbers in a string

    \w\w\w\w\w will highlight groups of 5 characters

    Notice how \w\w\w included numbers and letters

    \s\s will highlight double spaces

    Lets look for words that have only 4 characters.

    A 4 letter word can be described as,

    “a space followed by any 4 characters, followed by a space”

    \s\w\w\w\w\s

    Which can be rewritten as

    \s\w{4}\s

    But that will also include numbers. To ignore numbers

    \s\w{4}[a-z]\s

    Not quite there, if you are playing along you will notice that we are highlighting 4 letter words and the space before and after. What we need is to set boundaries.

    \b\w{4}[a-z]\b

    \b is a boundary, there are a few but for now lets stay with ‘spaces’. So with that you can find all four letter words