You may not remember the PL/I programming language (that’s a roman numeral 1, not an uppercase i), but I do. Well, not a lot. What I do remember is that the code looked like the programmer. That is, it looked like the programming language that the programmer knew. While, this is true of all programming languages, PL/I was designed with this in mind.
But, back to substrings. One substring of abcde is bcd. Depending on the language you’re programming in, the substring function or method is written differently. Function is typically, substr(string,start,end). Method is typically string.substr(start,end). The differences are in the name of the function / method and in the start and end.
Let’s take excel for instance. If you’re writing a formula, the substring function is mid (for middle). mid(string,start,length). In this case, start is the starting position. And length is the number of characters to return. mid(“abcde”,2,3) returns “bcd”. If you’re in excel writing macros (vba), it’s the same function, which is nice.
COBOL is similar. SUBSTR(STRING,START,LENGTH). The only difference is the name of the function.
But what about that substring and slice function? Python doesn’t even provide substring, only slice. How are they different?
substring and slice methods use the same format, other than the name. string.slice(start,end). You can get fancy and use negative values for start and end. If you do, substring and slice work differently. I suggest that you not do that, because whoever has to maintain your code, may get confused. In my case, that would be me.
So, instead of specifying the length or number of characters to return, substring and slice specify the end index. However, they make it (IMHO) complicated. Start and end both refer to indexes. However, start is inclusive and end is exclusive, according to Python documentation. What that means is the start index is where the substring starts. The end index is one character after where the substring ends. If you weren’t confused before, you probably are now :(.
All programming languages that I researched (and I researched about 15), use one of those syntaxes to return a substring, though they might call the function / method something different. Lisp calls it subseq and uses the slice syntax.
So, how to make one substring function which allows programmers to use the syntax they’re used to? I think that’s important, because getting the parameters wrong can cause havoc. Trust me on that one.
You also have to know whether the programmer is passing positions or indexes (or lengths). I determine this with parms. x_substr(string,start,end,start_type,end_type). start_type and end_type are arrays. The first element of the array is pos, ind/idx, or str => Position, Index, or String. Notice that I introduced a new way of returning a substring – identify it with strings. For end_type, the first element can also be len for length. The second element is bef, at/on, aft – for before, at/on, or after. VBA syntax is at or on for start. Slice syntax is at/on for start and aft for end. Perhaps I should add sli for slice. For len, the second element should be ”. In addition to adding str for either start or end, I also added bef. I figure to be fully generic, if there’s an aft, there must be a bef.
After I determine what the start and end indices actually are, I have to extract the substring and return it. I used a simple for loop to accomplish this. Using slice may have been slightly faster, but not enough to matter. And besides I understand the for loop, sort of.