Tech Recruiter Tries Coding pt. 3— More GAS and DB-fication of sheets!

SEQUEL IS OUT! Recruiter Codes Chrome Extension — A LinkedIn Fairy(-tale)!

In part 2 of this growing (and growingly wordy) series, I talked about how to database-ify and CRM-ify spreadsheets, and a few things about how to code this stuff, elaborating on the basics described in part 1.
But when it came to actual code examples, I switched to a different topic.

This is because I didn’t want to bore you to death with routine stuff like indexing, managing of doubles, and such. 🥱

Trying to keep this merciful objective in view(!), l will try to re-take the subject and talk about some of this stuff, hoping to give good examples of the additional power of GAS (Google Apps Script)-powered spreadsheets over a database or a CRM (which, I love to say, is just a database frontend.)

So I will start with a few basics, without too much code, and then dive into a couple of code examples for the functions that I had more fun coding.

Basic steps to become a heretic DB-fier of spreadsheets

Now, I know I’m making you picture in your mind a lot of stuff since last time: it’s because if I would show you the actual spreadsheet, I’d have to transform it into a rave of censoring-happy emojis!

But you can imagine that managing double entries is quite easy, once everything has a unique ID.

Since all of our entries/rows are associated with at least one URL (if not an Email address, which is unique by definition), what I did was building a function that “sanitizes” the URLs by cutting out anything that can vary, keeping only the unique part, to avoid “false doubles” when the same URL is saved with different parameters or slashes, due to different sources, etc.

Pixelation: hot or not? 😀

Checking for doubles is not interesting at all, save for one nice thing to make searches much faster: using GAS’s Properties Service to keep a reasonably updated list of unique IDs and something more, always ready to use, instead of getting it from the sheet all the time, which is much slower.

Functions that create indexes of uniqueIDs (thus finding double entries) and read/store them with Properties, can be automatized easily with the time-driven triggers that GAS kindly offers, to run at specified times.

Just make sure you set them at times you’re less likely to do changes yourself, especially if you schedule double deletion, that removes rows from the spreadsheet: I learned the hard way! 😅 (Thanks to the Version History feature of Google Sheets, nothing is ever lost anyway, but still…)

The safest choice I found is Day timer, and giving intervals of hours you know are “safe”.

An interesting thing about running this stuff automatically, is that you’ll want some reporting/notification system that doesn’t use the UI, right? … Right?

Here’s the thing: alerts, dialogs, etc. won’t work in triggered scripts.
Google will actually prevent the whole script from running if it detects any, and for good reason: imagine discovering you lost tons of writing/editing, because a script started running in the meantime, and opened a pop-up..!

Therefore, to have time-triggered scripts report to you, you could set:

  • Email notifications from GAS itself (but this won’t report anything custom, it will just tell you if an execution ends because of errors.)
  • Custom emails that you send to yourself, from your own scripts.
  • Actually outputting on the sheet, though it better be in frozen cells.
  • separate web service to check the status of stuff.

For me, receiving extra emails is always bad, and outputting on the spreadsheet too, since it’s slow, forces me to dedicate “real estate” of the spreadsheet to this, and it also updates the sheet window/view when formatting is involved: not nice if I’m editing stuff in that moment.

So, since I had built a GAS web service for something else before, I was close to building one for this purpose as well, when I realized a simpler solution: we can use sheet names/tabs for reporting!

The tab becomes green when it reports no doubles! 🙂 (More on those icons on the right later…)

I built a small function for this, just to take care of doubles, and it’s a joy to see it at work, but the potential is even bigger!

It could be used as a dynamic warning sign for when other time-driven scripts are running, or to give dynamic tips… It can almost be animated, although slowly and only through text, emoji, and a single bar of color.

It’s basically a piece of the UI that, since it doesn’t receive frequent input from the user, has been left open even for time-triggered scripts to use.
It’s such a nice backdoor, that we better not spread the word about it, or Google might change the rules!🤫

There are a lot more related topics, some big and some small, but I think these were the most-interesting, among the ones I can show.
I encourage you to discuss the topic in the comments, or, if you’d like a full-fledged system of this kind, just contact me or FusionWorks about it!

Next, we dive into some code to see how to deal with entries that are not exactly doubles, but that should not be on separate entries either… A very specific case : separate job posts for the same jobs!

“The Piggyback”… GAS + conditional formatting!

Very often, companies will post the same jobs multiple times, so you will have the same title but different URLs: replicas of the same job, maybe for different countries, or more often just for added visibility.

These are important for Lead Generation since in sales-talk they’re additional “buyer intent data”. So keeping them separate, other than being a potential source of errors or double work for Lead Generators, is bad from a lead classification point of view: they should add to a lead’s score!

Now, making a custom check for this, even when indexing unique IDs, can be a bit clunky, since there are three conditions to check:

  • Same company
  • Same title
  • Different URL

But since I applied some conditional formatting to the spreadsheet, that highlights titles and companies appearing more than once, I thought of a different approach: what if the script could check the colors?

It should look at rows in which BOTH company name and titles are colored.

In this case, only the 3rd line would be one that MAY be a replicated job post, since both the title and the name of the company (censored) are highlighted, but actually it’s not the case: guessed why?

The condition by itself is not sufficient, as the tricky faces in the screenshot suggest: the title might be repeated by another company, like here the 1st one having the same job title as the 3rd one.

So we want cases where this occurs twice or more, to the same couplings.

Suddenly, the logic seems similar to the one of word frequency extraction, described in pt. 2!

I totally re-used the logic of word frequency extraction, but with some KEY differences, pun intended…

If you can look past the aggressive commenting, you see the condition of the colors being those of the previous screenshot at the same time, is used to build a “Duples{}” object in a very similar fashion as the wordFreq{} object used for skill extraction in pt. 2, with two main differences:

  • The keys are named after both the company name and job name combined. (To count together only those where BOTH match.)
  • The values are actually arrays, where we push “row+1”, which converts the index of the forEach loop into the actual row number where the condition is found. (Loops count from 0, Google Sheet counts from 1.)

This way, we end up with {‘company — job title’: [row1, row2, …]}

Or do we? Not right away! The second block of code, where we go through the keys of this new object, is where we filter out the cases where we found only single occurrences (false cases like the screenshot above), by checking the length of the rows array (called “post” in this loop), and straight up delete key-value couples with only 1 element, as they have no interest for us.

Instead, for each case where the length of the array (which is the number of rows respecting the condition) is 2 or more, we record a Total number of posts that will be consolidated, and the number of those to be deleted (TBD), for reporting purposes. (The nice popup message, below).

Work is not finished though. Apart from reporting, we wanna make sure we keep something of the latest post, merged into the first one (we want to keep the old post where it is, but with updated info from the newest one.)

The hefty part looks quite neat this time…

Here (if we hit “OK” previously) we proceed to the real work done by this function: another forEach loop on the keys of the object, which will record a “masterRow” as the first entry found (which as mentioned should be the one we want to keep), and a lastEntry, which is always the most recent.

Then we update the number of a column called “ConsolidatedPosts”, within the masterRow (corrected here in a contrary way because we go from actual row number to position in an array, so minus 1) adding the number of elements minus 1 (not to count twice the original row we keep.)

Then we have to copy one by one (a slow process… In milliseconds, at least 😏) relevant fields from lastEntry into masterRow.

Finally, we shift() the array of the entry in question, which removes the first element: we don’t need the masterRow anymore after it’s updated: we only need the rows to be deleted!

And finally finally we just write the values of ConsolidatedPosts all at once, which is much faster than doing those one by one.

Programmatically deleting rows & pseudo-trash bins

At this point we can delete. But wait! I don’t want permanent deletion: I fetched this data, so I wanna keep copies archived. 🤓 — I want a trash bin!

Thanks to the logical soul who shocked StackOverflow by showing the ease of deleting bottom to top! 😅

At the cost of copying the rows to be deleted into the unimaginatively-called Values[][] array, I can invoke my custom “throwToTrash(…)” function.

A few words about row deletion, which you might have noticed comes after sorting of the array… A simple loop to delete could have sufficed, right?

Wrong! For each deletion, all the rows below it will shift, and if we don’t adjust for this, on the next iteration we delete a wrong one, and then cascade like this till we delete all the wrong rows except the first one… (Thanks Google for the Restore Version feature!! 😅)

At first, I figured out a way to adjust for this, which was clunky as hell, because sometimes you have to adjust, sometimes you don’t, depending on where the previous deletion happened: below or above.

Then, looking for better ways to do this on StackOverflow, a kind soul there reminded the whole world to switch on their brain: just sort the rows from bottom to top before deleting, and you don’t have to adjust anything!! 🤯

Back to the trash function, I use it for many purposes, like removing posts from incompatible companies, but without losing track of them. Here it is.

A few neat tricks here as well, but nothing fancy, aside from that long line after “if (RichData)”… Details below!

It checks for a sheet called “🗑️”, literally the emoji for the trash bin. 😄

If it finds it, it’s ready to copy the stuff we pass through the function’s arguments (with types specified, so we avoid messing up, thanks to TypeScript!) Otherwise, it creates a news one programmatically, copying the header from the sheet we specify, before proceeding to do the rest.

Here the only neat thing is combining setRichTextValues() (needed for links) and regular setValues() (since I don’t want to format stuff just to make it Rich Text.) It was quite a headache in the beginning of my GAS days!

Because the problem is they will overwrite each other even when they don’t carries values, effectively deleting stuff. So you want to separate the ranges you apply them to.

My solution to this issue in ONE code line [after if(RichData)] inspired me a childish “KLASS!” comment, on the right: it basically chains a call of setValues after RichTextValues, modifying the range in-between, and mapping values to take only a slice of columns.
I confess: this used to take me many lines before… It felt so good to find a way to make it a one-liner!! 😆

What you actually want to read is here!

Once again I got carried away, so I hope the non-coders among you (and the many coders for whom this is easy stuff) skimmed carelessly till here.

While last time the focus was performance (and how sometimes it should not be the focus), this time the lesson is more abstract, and it’s been swimming in my mind for some time, as mentioned in the previous post.

In short: sometimes the UI you want to build is already in front of you!

It tickles my mind every day now: there is so much that can be done, programming on top of programs that are already useful, like Google Sheets. The cases above, of using conditional formatting to help a script, sheet tabs as notification space, or entire sheets for trash collection, are small but valuable examples of this.

But it goes beyond UI, like when I saw I could basically cache stuff on the document/script properties for fast access; there are so many things that can work with/for you, and you just have to connect them.

All in all I feel like I just barely scratched the surface of a new world!

Not such a flat world, but you get the picture!

This cloudy (pun intended) thought first came to me when I created my first (and for now only) Chrome Extension for recruitment, which I will talk about in the next post/NOVEL I will write… (I keep postponing it!😅)
It basically has no UI, using Chrome’s UI to do extra stuff on top of the usual: new functionality for UI elements I had for free, and had to use anyway…

This is why last time I closed with the following quote, and I’ll do so again:

We are like dwarfs on the shoulders of giants, so that we can see more than they, and things at a greater distance, not by virtue of any sharpness of sight on our part, or any physical distinction, but because we are carried high and raised up by their giant size.

— Attributed to Bernard of Chartres by John of Salisbury, circa 1159

In the IT world, this is true on so many levels, but I feel like I discovered a “new level” of this: one that is not at the very top of the scale, the user level, but also not at the bottom, the develop-from-scratch level: it’s somewhere inside a seldom-explored middle-ground, where we can squeeze everything out from existing software, and build on top, to actually reach higher levels of productivity than with vanilla stuff, faster than developing from scratch.

Once again, magic made possible by FusionWorks!! 😎

Tech Recruiter Tries Coding pt. 2 — Google Apps (Type) Script

UPDATE: Part 3 is out!

UPDATE 2: Check the sequel! Recruiter Codes Chrome Extension — A LinkedIn Fairy(-tale)!

Hello again from the world of amateur-ish coding! 😄

In the first part of this now-serial blog, I talked mostly about how to automate recruitment statistics in Google Sheets. But statistics, even when classifying lots of data with lots of rules, are rather trivial in coding terms.

To get into something slightly more advanced, I’ll venture a bit out of the Recruitment domain, and get into what’s usually called Lead Generation.

Retro-engineering the over-engineered world of Lead Generation

For a company that, like FusionWorks, offers awesome software development services, a “lead” is actually quite related to Recruitment:

  • A job post for a software development role, since it’s a need it can fulfill.
  • Contact details for relevant people in the company that posted it.
  • Info about the job, the company, and its people, to classify the lead.

Now, most companies tackle this part of the business doing the following:

  • Buying leads data, from companies that collect it in bulk, somehow…
  • Importing them into some expensive CRM SaaS solution aimed at sales.
  • In such CRM, have one page for each lead where you can change its status, comment on it, and have links for emails, websites, etc.

If it seems overkill, it’s because it is! The truth is CRMs are just fancy interfaces to do basic operations on a database. Actually the simplest of operations, in terms of the underlying logic: basic CRUD (Create, Read, Update, Delete.)

The prize for the most over-engineered CRM goes to… HubSpot! Also thanks to its sales rep ghosting me! 😆 … The No-Bullshit logo instead is from FusionWorks’s manifesto. 😎

And about bulk leads data, let’s just say it’s some doubtfully accurate (and doubtfully legal, in many cases) stuff to populate the database with.

The best of databases without databases (and why this matters)

At its core, a (relational) database is nothing more than a collection of tables, with some codified relations between them.

It might seem reductive, but if you add some validation rules and scripts to sheets on Google Sheets or Excel, where each sheet can be the exact equivalent of a database table, what you get is a relational database, but one that even people without database experience can work on, and with a UI that can be as fancy and usable as that of CRMs, if not more…

A single, simpler, much cheaper solution can kill two techs/products! ✌️

And I see I’m not the first to think about it! Mine looks better, although you’ll have to take my word for it… 😁

Reasonable people might disagree, pointing out that I oversimplified databases, and didn’t account for the fact that databases will always offer more possibilities, and be faster, compared to spreadsheets acting as databases.

These would be valid counterpoints if not for a few details:

  • For Lead Generation, the extra PROs of databases are largely unneeded.
  • Perhaps worse than that, direct database usage (without a CRM as a frontend) would raise the learning curve a lot for Lead Generators.
  • Using a CRM, you lose a lot of the flexibility and power proper of both databases and spreadsheets, AND you pay a premium for it!

While following courses on databases, it dawned on me that the only characteristic of databases one could really want for Lead Generation, is not even exclusive to databases, nor always applied to them: Database Normalization, in particular the first three “Normal Forms”.

Boring, huh? The source article is not so bad, but I learned about this in a course where it’s explained better.

Don’t bother looking these up unless you’re into this stuff: most explanations are crazily formal (pun intended). For us this translates mostly into avoiding duplicate contacts or leads: it could lead to bad mistakes in Lead Generation!

To achieve this and more. we could plan things in the following way:

  • Multiple sheets (like the tables of a database) separating jobs, companies, and contacts, one row per entity.
  • Each entity associated with a unique identifier, so we can delete doubles without relying on names and titles. [Important not only for contacts: so many different jobs in different places can be titled“JavaScript Developer”!]
  • Some scripting to keep all of this together, and especially populate the database with updated info, without someone having to manually write or copy-paste tons of data, which is alienating as hell.

About the last topic, automatic lead data collection, I won’t reveal how we did it because it falls within the textbook definition of business secret… Suffice to say it requires specific research and coding: if you’d like to try it out, you can contact either me or FusionWorks, and we’ll be happy to help!

But about the rest, I will talk about some examples, and about how it’s all easily developed from the comfort of VS Code, using… TypeScript!

 Skip/skim the next section, if you’re not into coding! 

Going TypeScript without even knowing much of JavaScript?!? 🤯

Having started to code this stuff using the regular Google Apps Script web IDE, it took me some time and courage to switch to VS Code, and even Git: it’s the stuff of real developers. Very far out of my comfort zone!!

But many creators on YouTube that talk about Google Apps Script (usually abbreviated GAS) were endorsing this method, and as an avid consumer of these videos, I decided to give it a try, and… Oh boy, was I missing out!

The trick is using a tool Google made, called CLASP (for Command Line Apps Script Project), which not only allows for pushing and pulling to/from GAS seamlessly from VS Code, (or whatever other IDE you like to use, I guess) but also automatically transpiles TypeScript code into the best JavaScript GAS can use. This is HUGE, even for a noob like me. Why?

I had just started using TypeScript to build a custom Google Chrome extension (topic for a later post in this series I guess, even if chronologically it came before this), and what I noticed is that, by simply changing JavaScript files to TypeScript on VS Code, the main difference is you are forced to be more precise, which avoids running into stupid errors!

You will still get hit by your own stupidity, but at least you’ll get to do some cool stuff before that! 😄

Therefore, to my surprise, it is actually easier to code in TypeScript than JavaScript, even more so for a beginner like me, because:

  • There is not much to “unlearn” from JS, if you just started anyway.
  • You’re alerted when values can be of a type that would cause errors.
  • You’re forced to do something about it, basically debugging in advance.

Therefore, from now on, all the code I will share will be GAS, but written in TypeScript, even if most of the time there will be little to no difference.

 Skip/skim the next TWO sections, if you’re not into coding! 

Of batch-extracting words from webpages, and ancient coding…

Say you have a lot of leads you “magically” get every day (ask in private 🥸) into your Google Sheets, but you want to know at a glance what’s there, without even opening the link. And say these links are to job posts, so this at-a-glance info you want, is the skills the jobs require.

Since there’s no standard format for job posts, you need to parse the webpage word by word, and extract the words looking like skills—possibly counting how many times each is mentioned, to see which is more or less important.

Now, since I learned the basics of programming in ancient times, and then completely stopped until recently, I always tend to think the tasks I give to my code could be too big, and will make the application/script too slow… 🙄

Happens when you last coded in 2002! Here’s my FIRST coding article, app, and website… I was 15…! 🥲

I know GAS even has a limit for how long a script can run, so if I hit it, the results might not arrive, or data might get corrupted! So performance was my top concern: how would I give such a huge task to this thing, looking at the text of so many pages, without risking slow-downs..? 🤔

It may be more obvious to you than to me: in 2022, this way of thinking of mine is completely unjustified, and outdated!! 😅 It’s totally possible to split even a long webpage in all of its words, count the occurrences of specific words, and put the results into a Google Sheet note that magically appears next to the link, in less than a second per page!

With the knowledge I had when I wrote the first article, I would have probably tried to tackle this using a double array of keywords and the number of occurrences, with some crazy nested iteration to go through it… 🤔

The fact JavaScript is object-oriented, and I could have used objects where key-value pairs constitute the skill words and their frequency on a page, would have seemed to my past self of a couple of weeks ago like a sure way of overloading the script and slowing everything down! 😱

Needless to say, it turned out even this concern of mine was archaic BS, and working with objects in JavaScript is not slower than working with arrays in any meaningful way, so using objects, code can become much more readable and less convoluted, without perceivable performance drops!

Batch-extracting skills from job posts: finally the code!

To begin with, I wanted a way to easily change the dictionary of terms I was going to search for with this custom function, without having to change the code, so that Lead Generators could do so without any coding. 🫶

Since we are on Google Sheets, this is as easy as creating a sheet with one word per row in one column, and maybe my beloved checkboxes in the next column, so that we can take stuff in and out of the dictionary without having to delete and rewrite, just by checking or unchecking the boxes.

Here’s an example. Case sensitivity is a bit of a pain, but easier to tackle here than in the code!

Here’s the code that reads that sheet, and creates an array with each word:

Then comes the nice part:

First of all it takes the function’s argument, content (the actual text of the webpage) and splits it into an array of words (inputWords) for easy parsing.

Then within the forEach loop, it builds the previously empty wordFreq{} object, adding keys named like the words/skills found, and associating to them a value that builds up to the number of times the word is found.

Note that this is done with a nested forEach loop on the previously built array of keywords to be found (called “search”), and works in two ways: if the search term is listed in double quotes, it will be found only if it’s a stand-alone. If not, also when attached to other words/characters.

Useful because otherwise “Java” would always be included in every “JavaScript” job post… 😄 But not if we search for it with the double quotes!

I know, this is not perfect for every case: looking for stand-alone words will fail if even just a coma is attached. I could easily take punctuation out from the content before starting the search, but for now this works like a charm.

Finally, let’s make this something that can look nice in a Google Sheet note.

The first fancy line here sorts the key-value pairs within the object by the size of their value, to sort skills by frequency, instead of the order they’re found in. It breaks the object into an array, sorts it, puts it back together.

This is one of the things that my “ancient coding mindset” would consider cutting out, to gain some speed. After all, the most frequent keyword is usually found first anyway, and even if not, the higher number will make it pop… 🤔 But again: a couple of seconds is not much, while a consistent order of keywords by frequency is, so I decided to trust modernity!

After this fancy sorting, another forEach loop on the object’s keys makes us build a string that is formatted nicely, with line returns and everything.

Finally, if no words are found, we return another nicely formatted note to be attached to the job post, explaining possible reasons.

Here’s an output: the note on the right, popping up from the job column.

And here the function that does this to all rows selected with checkboxes:

Straightforward: looks at the checkboxes I added in each row to perform a “sticky selection”, looking at the links column only on rows where boxes are checked, and adds a note with “setNote(extractSkills(content))” on every entry where we get a positive “200” response code from the HTTP request.

You might find that your favorite jobs site blocks requests from Google Apps Scripts, and once again, if you want to know the secrets to make this work, contact me or FusionWorks, and we can set it up for you… For a price. 😏

Still, this is not perfect, I know. For example, I just noticed that reading and building the dictionary of searched terms within the extractSkills() function is very inefficient: it should be read only once, and passed to extractSkills(), to avoid re-looking at the dictionary every time a page is analyzed.
Finally an example of my ancient mindset being useful! 😎

P.S.: I tried, and indeed the the time was reduced by around 30%, down to 5.8 seconds to search/extract 40 keywords in 10 job posts! Sometimes thinking about performance pays. Out of curiosity, I tried taking away the sorting too, but no effect there, proving again that my ancient mindset needs an update! 😄

Lessons learned, AKA the only part worth reading!!

I really got carried away with the details above, so I hope the non-coders among you (and even most of the coders) skimmed till here.

In short: Google Apps Script written in TypeScript and managed with CLASP+Git on VS Code is an amazingly powerful thing!

I learned that while my concerns for performance sometimes lead to nice gains, they’re more often than not just fossils of my “90s computing mindset”, and I often tend to overestimate them: they surely don’t justify writing ugly code, or avoiding sophisticated stuff just to save some CPU cycles. The CPUs are Google’s anyway, and we use them for free! 😆

As long as the time it takes for scripts to chew your data is not too much for you personally, nor the MINUTES that Google puts as a limit, you’ll be fine.
This stuff saves HOURS of work, not seconds. But you shouldn’t spend hours writing “perfect” code. Keeping it clean is more useful!

I also learned you can easily make something better than most commercial CRMs, with the most important advantages of Databases (since CRMs are nothing but frontends for Databases, I can’t repeat it enough): you just need Google Sheets plus a few hundred lines of code, that are quite quick to write, since with spreadsheets there’s little abstraction: you can almost touch an array, if you’re ok with fingerprints on your screen… 😜

There was no meme about touching arrays… Obviously! 😅

The most valuable lesson though, is something a bit abstract indeed, and I’m not sure I fully grasped it yet, so I’ll leave it to the next article, closing with a quote I remembered from my distant humanistic background:

We are like dwarfs on the shoulders of giants, so that we can see more than they, and things at a greater distance, not by virtue of any sharpness of sight on our part, or any physical distinction, but because we are carried high and raised up by their giant size.

— Attributed to Bernard of Chartres by John of Salisbury, circa 1159

It’s fitting on so many levels!
But for now I’ll close by thanking the best of the giants carrying “dwarf me” on its shoulders: FusionWorks! 😎

UPDATE: Part 3 is out!

Check the sequel! Recruiter Codes Chrome Extension — A LinkedIn Fairy(-tale)!

Monitoring applications in several Kubernetes clusters with Prometheus, Grafana and Loki

Monitoring and logging are important parts of application development and support. This statement relates not only to the production environment but to staging and others as well. What would be a solution to have all the metrics and logs from different environments or clusters in a single place?

Let’s assume that you have 2 separate K8S clusters for our environments. We don’t want the additional load in our main cluster, but still want to get the required metrics and logs from there.

The solution below will keep all the parts that can create any additional load in one cluster[Cluster #1 in the diagram] and have only the required tools in the main cluster[Cluster #2]. Please note that all the blue parts are in the monitoring namespace.

Monitoring toolset diagram

A couple of words before we start — we use the Prometheus stack for monitoring. Prometheus is a standard de facto in K8S world. It’s going to store all the metrics we get from the applications. Grafana will help us with making all these metrics and logs visible. Loki is an open-source software for logging. We also use some other tools like MinIOPromtail, etc.

Install MinIO

MinIO is an Object Storage solution compatible with AWS S3 API, which would allow us to store our logs and other data directly in our cluster.

First of all, we have to create our monitoring namespace:

kubectl create namespace monitoring

Now we can install MinIO with a Helm chart:

helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm install minio bitnami/minio --set="persistence.size=20Gi" -n monitoring

An important note — make a rough analysis of how many logs and metrics your application will generate and want to store simultaneously. The example above create a persistence volume with 20Gb space.

Install Loki

Loki is our solution for logging aggregation that we are going to use. We are going to connect it to MinIO so our logs would be hosted in MinIO.

To install Loki we are using its Helm chart:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm upgrade --install loki grafana/loki-distributed -n monitoring -f .\loki-values.yaml

But this time to adjust more parameters instead of a simple — set key, we are using a .\loki-values.yaml file:

loki:storageConfig:aws:
s3: null
s3forcepathstyle: true
bucketnames: loki
endpoint: http://minio:9000
region: null
access_key_id: {ACCESS_KEY_ID_FROM_MINIO}
secret_access_key: {SECRET_ACCESS_KEY_FROM_MINIO}
insecure: trueboltdb_shipper:
shared_store: s3
active_index_directory: /var/loki/index
cache_location: /var/loki/cache
cache_ttl: 168hfilesystem: {}schemaConfig:configs:
- from: 2020-09-07
store: boltdb-shipper
object_store: aws
schema: v11
index:
prefix: loki_index_
period: 24hgateway:ingress:
enabled: true
ingressClassName: nginx
hosts:
- host: {LOKI_HOSTNAME}
paths:
- path: /
pathType: ImplementationSpecific
tls:
- secretName: logs-tls
hosts:
- {LOKI_HOSTNAME}basicAuth:
enabled: true
username: {LOKI_USER}
password: {LOKI_PASSWORD}

Install Promtail

Promatil is one of several log agents supported by Loki, it is the default one, so we are going to use it.

The installation again would be done with Helm chart:

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm upgrade --install promtail grafana/promtail -n monitoring -f .\promtail-values.yaml

And as in previous example with Loki, we are gonna use a values file:

config:
clients:
- url: https://{LOKI_HOSTNAME}/loki/api/v1/push
basic_auth:
username: {LOKI_USER}
password: {LOKI_PASS}

Install Prometheus

Prometheus is an alerting and monitoring solution that allows us to collect and store metrics on our resources.

And it would be installed with a Helm chart:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm upgrade --install prometheus prometheus-community/kube-prometheus-stack -n monitoring -f .\prometheus-values.yaml

And this is the values file we are using to configure Persistent Storage, Alert Manager Ingress, and disable default Grafana as we are going to use a standalone one.

prometheus:
prometheusSpec:storageSpec:
volumeClaimTemplate:
spec:
storageClassName: do-block-storage
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 10Gialertmanager:
ingress:
enabled: true
ingressClassName: nginx
hosts:
- {ALERTING_HOST}
paths:
- /
pathType: ImplementationSpecific
tls:
- secretName: alerts-tls
hosts:
- {ALERTING_HOST}grafana:
enabled: false

Install Grafana

Grafana allows you to query, visualize, alert on and understand your metrics no matter where they are stored. Create, explore, and share beautiful dashboards with your team.

And in order to install it, we are using a Helm chart:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm upgrade --install grafana grafana/grafana -n monitoring -f .\grafana-values.yaml

To provide the configurations we need, we are using this values file. You might notice that we pass OIDC configuration to Grafana to enable login with the Google authentication method. It’s the auth.google part of the config.

datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Loki
type: loki
url: http://loki-loki-distributed-query-frontend.monitoring:3100
- name: Prometheus
type: prometheus
url: http://prometheus-kube-prometheus-prometheus.monitoring:9090ingress:
enabled: true
ingressClassName: nginx
hosts:
- {GRAFANA_HOSTNAME}
paths:
- /
pathType: ImplementationSpecific
tls:
- secretName: monitoring-tls
hosts:
- {GRAFANA_HOSTNAME}grafana.ini:
server:
root_url: https://{GRAFANA_HOSTNAME}
auth.google:
enabled: true
client_id: {CLIENT_ID}
client_secret: {CLIENT_SECRET}
scopes: https://www.googleapis.com/auth/userinfo.profilehttps://www.googleapis.com/auth/userinfo.email
auth_url: https://accounts.google.com/o/oauth2/auth
token_url: https://accounts.google.com/o/oauth2/token
allowed_domains: {DOMAIN_FOR_OAUTH}
allow_sign_up: truepersistence:
enabled: true

Add another cluster to the monitoring

To add another cluster to the monitoring all we have to do is to install the agents on this cluster and expose endpoints to allow communications:

  1. Create a monitoring namespace in the second K8S cluster
  2. Install Promtail as described previously
  3. Install Prometheus as described previously
  4. Expose the second cluster Prometheus with an Ingress
  5. Set first cluster Prometheus to federate scrape exposed endpoint

To set Prometheus to federate mode we have to upgrade it by adding additional scrape configs to the values file:

additionalScrapeConfigs:
- job_name: 'federate'
scrape_interval: 15shonor_labels: true
metrics_path: '/federate'params:
'match[]':
- '{__name__=~".+"}'static_configs:
- targets:
- {SECOND_PROMETHEUS_HOSTNAME}

Next steps

Once you’ve got all the parts installed and configured, your next step will be Grafana dashboard configuration to ease your usage experience.

How to set up your Hashicorp Vault with Google authentication?

Hashicorp Vault is a tool that allows you secure, store, and control access to different types of secrets, like credentials, certificates, tokens, etc. It could be used to share secrets within teams as well as be incorporated into CI/CD pipelines.

Installing Hashicorp Vault

The first step is to add official HashiCorp repo to apt sources:

apt-get update
apt-get install gpg wget
wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg >/dev/null;
gpg --no-default-keyring --keyring /usr/share/keyrings/hashicorp-archive-keyring.gpg --fingerprint;
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list;

Now we can install Vault itself:

apt-get update
apt-get install vault

Configure Vault for the first startup

Once Vault is installed go to /etc/vault.d/ directory and edit the vault.hcl file. Replace it with the example below. As a part of general practice, we set Vault to listen to the internal IP address and will expose it to the outside world using Nginx later.

ui = truedisable_mlock = truestorage "file" {
path = "/opt/vault/data"
}api_addr = "http://{ PRIVATE_SERVER_IP }:8200"listener "tcp" {
address = "{ PRIVATE_SERVER_IP }:8200"
tls_disable = "false"
}

Once it’s done, run the command below to check that the config is fine.

vault server -config=/etc/vault.d/vault.hcl

If everything is fine, execute the next command to run Vault as a service and check its status. Please note that these commands might vary depending on Linux distribution you use and what process management software is installed.

systemctl start vault
systemctl status vault

After this, you MUST init Vault server with the command below. This command will respond with seal keys and an initial root token. Ensure you keep them in a safe and secure place. These credentials would require to access Vault in order to configure OIDC authentication and further setup.

vault operator init

Configure Nginx proxy

As it was mentioned before we use Nginx as a proxy before Vault server. Let’s install it first:

apt-get update
apt-get install nginx

And configure it using the template below. You can use cert-manager to generate valid SSL certificates.

upstream vaultui{
server { PRIVATE_SERVER_IP }:8200;
}server {
listen 80;server_name { VAULT_URL };location ~ "^/\.well-known/acme-challenge/(.*)$" {
default_type text/plain;
return 200 "$1.{SSL ACME challenge response}";
}
return 301 https://$host$request_uri;
}server {
listen 443 ssl http2;
server_name { VAULT_URL };ssl_certificate /etc/letsencrypt/live/{vault url}/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/{vault url}/privkey.pem;location / {
proxy_pass http://vaultui;
}location @fallback {
proxy_pass http://vaultui;
}
}

Configure OIDC (Google)

Next step is to enable login methods. The main login method will be “Login with Google“. We will need to create a default role, default policy and configure OIDC auth.

The first is policy. It set the minimum required permissions for users.

# Allow tokens to look up their own properties
path "auth/token/lookup-self" {
capabilities = ["read"]
}# Allow tokens to renew themselves
path "auth/token/renew-self" {
capabilities = ["update"]
}# Allow tokens to revoke themselves
path "auth/token/revoke-self" {
capabilities = ["update"]
}# Allow a token to look up its own capabilities on a path
path "sys/capabilities-self" {
capabilities = ["update"]
}# Allow a token to look up its own entity by id or name
path "identity/entity/id/{{identity.entity.id}}" {
capabilities = ["read"]
}
path "identity/entity/name/{{identity.entity.name}}" {
capabilities = ["read"]
}# Allow a token to look up its resultant ACL from all policies. This is useful
# for UIs. It is an internal path because the format may change at any time
# based on how the internal ACL features and capabilities change.
path "sys/internal/ui/resultant-acl" {
capabilities = ["read"]
}# Allow a token to renew a lease via lease_id in the request body; old path for
# old clients, new path for newer
path "sys/renew" {
capabilities = ["update"]
}
path "sys/leases/renew" {
capabilities = ["update"]
}# Allow looking up lease properties. This requires knowing the lease ID ahead
# of time and does not divulge any sensitive information.
path "sys/leases/lookup" {
capabilities = ["update"]
}# Allow a token to manage its own cubbyhole
path "cubbyhole/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}# Allow a token to wrap arbitrary values in a response-wrapping token
path "sys/wrapping/wrap" {
capabilities = ["update"]
}# Allow a token to look up the creation time and TTL of a given
# response-wrapping token
path "sys/wrapping/lookup" {
capabilities = ["update"]
}# Allow a token to unwrap a response-wrapping token. This is a convenience to
# avoid client token swapping since this is also part of the response wrapping
# policy.
path "sys/wrapping/unwrap" {
capabilities = ["update"]
}# Allow general purpose tools
path "sys/tools/hash" {
capabilities = ["update"]
}
path "sys/tools/hash/*" {
capabilities = ["update"]
}# Allow checking the status of a Control Group request if the user has the
# accessor
path "sys/control-group/request" {
capabilities = ["update"]
}# Allow a token to make requests to the Authorization Endpoint for OIDC providers.
path "identity/oidc/provider/+/authorize" {
capabilities = ["read", "update"]
}

Next step — create a role. The example below will get user emails and set them as alias in Vault. Also we set additional mappings for claims like name, email and sub as alias metadata to get more info about user.

vault write auth/oidc/role/{role name} \
user_claim="email" \
claim_mappings="name"="name" \
claim_mappings="email"="email" \
claim_mappings="sub"="sub" \
group_claim="groups" \
bound_audiences={OIDC Client ID} \
allowed_redirect_uris={Callback URL} \
policies=default \
oidc_scopes="email,openid,profile" \
max_age=0 \
ttl=1h

Finally — create an OIDC config. We use Google as OIDC provider, please refer to this page to get more information.

vault write auth/oidc/config -<<EOF
{
"oidc_discovery_url": "https://accounts.google.com",
"oidc_client_id": "{OIDC Client ID}",
"oidc_client_secret": "{OIDC Client secret}",
"default_role": "{role name}"
}
EOF

That’s all — your Vault installation is ready to use.

Our Tech Recruiter Tries Coding — Google Apps Script

UPDATE: Part 2 is out!
UPDATE 2: 
Part 3 is also out!
UPDATE 3: 
Recruiter Codes Chrome Extension — A LinkedIn Fairy(-tale)!

After years of working adjacent to software development, first in sales and then in recruitment, I abruptly decided to take the plunge into the other side, and start coding to automate or speed up tasks of my everyday job.

The idea came by when I started using Google Sheets more heavily, and realized how powerful it is by itself, and how much more it can become with the addition of Google Apps Script, which I discovered to my surprise is actually modern JavaScript.

A bit of background (you can skip this)

I have a distant, small, and self-taught programming background, dated almost 25 years, in times when there was no stackoverflow.com, but a lot of stack overflow errors popping up on Windows… I learned Basic first as a kid, on an already-ancient Commodore 64, and following the advice of a cousin who was studying software engineering, switched directly to C++, to write desktop applications for a long-gone (but beautiful) operating system named BeOS, which was so nice it was remade from scratch by fans into an opensource OS called Haiku.

When my platform of choice died I was 15, and while I had managed to develop a couple of more-or-less useful applications, other than personal websites using then-new techs like XHTML 1.0 (HTML 4 + XML), SSIs, and CSS2, I decided to quit programming for good, thinking that anyway I wanted to work in a different sector. Many people told me I was a fool to waste what I had learned, and not continue further, but from the height of my teenage wisdom, I thought I knew better… Obviously I didn’t, but the choice was made, and I decided to study Earth Sciences (Geology) at university, and never wrote code again for 20 years.

Fate had a sense of humor, and I never actually worked in geology, with my first office job being already closer to IT, and each subsequent job getting closer and closer to it. It seems like it was destiny!

This is how I got to FusionWorks — an IT company with 11-year history and 57 clients in 18 countries. Here I’m a geekruiter — not sure because I’m considered to be a geek or I need to recruit geeks — they don’t tell me.

Learn while working: JS, HTML, CSS, all within Google Apps Script

During the last couple of years of working closely with web developers, I often wondered what it would take to learn JavaScript. I still remembered the basics of HTML and CSS from my teenage years, but for some reason, JavaScript felt like a scary step. Maybe it was because I knew how much JavaScript developers were sought-after in the job market, or because when I looked at it, I felt like it was touching on too many elements I didn’t know, and that it would have taken me too much time and effort to learn.

But when I started using Google Apps Script from Google Sheets, at first using the resources of the great Spreadsheet Dev website, I realized that JavaScript came pretty natural to me, if it was used to manipulate the stuff I used every day on Google Sheets. The scope of what you can do is small at first, but the more you progress, the more you learn JavaScript, and can even add HTML and CSS to the mix if you’re into creating custom UIs.
In my first hours of excitement, I was going in that direction, and I will cover this briefly here, although I later abandoned this path because I discovered more efficiency in getting rid of UI, instead of adding to it.

Btw, as being learning-driven is one of FusionWorks’ main values, they support my desire to code while recruiting.

I will now detail a few useful things you can do with Google Apps Script, without any previous knowledge of JavaScript, and possibly nothing but the very basics of programming in general.

Custom buttons to do simple operations (safely skippable)

Perhaps the easiest and most straightforward way to learn Google Apps Script and put it to good use is creating custom buttons. It’s a two-part process: writing a small script, and then associating it to a drawing, where you can create an actual button (or if you’re lazy like me, write an emoji into it that will work as a button.)

To start writing your scripts, you head over to “Apps Script” within the “Extensions” menu:

If you never used it before, you will find yourself on a page like this:

Or if like me you first started messing around with macros, thinking it might be easier (it isn’t), it might be called “Recorded Macros (your sheet name)”. Here below is the first function I wrote, which saved me some clicks when needing to add 1 to a value, back when I was counting things manually…

If you know anything about coding, the above will feel pretty obvious and you can safely skip the whole section, but I will talk about it a little bit for those that like me started from scratch.

First, let’s make the code more visible with a Gist (I just googled this):

https://medium.com/media/b76700b27edaa2b8339a86ef7c34a0f8

What this does is described in the first comment line, and I was going to write a line-per-line explanation as well, for those of you who are beginners like I was when writing this, but this article would become super-long, so feel free to ask for details in comments, and I will be glad to go over it.

In any case, it’s all very well-documented by Google, and if the documentation seems a bit difficult at first, head over to Spreadsheet Dev, and look at the many how-to’s, which are explained in a much less technical way.

After writing this and saving it, you can test it by associating it with a button, which as mentioned is created from a drawing:

Which can even contain just a big emoji inside a text box, to make it quick:

Careful with the text box size: all its area will be clickable as a button, so make it very close to its content!

After you save it, it will appear in your spreadsheet and you will be able to associate the function you made, by clicking on the three dots appearing when you select it (they might overlap the contents and be difficult to see like here below), and clicking on Assign script, where you will just type the name of the function you just created.

And that’s it, a small step for a coder, but a big step for a noob!

Now, this is useful in very few scenarios, mainly when working in split screen and not wanting to switch windows a lot when counting some stuff.

But what if we want to make sure we update a specific cell based on some parameters, and not whatever is currently selected? In particular, let’s see how to do it in case what we need to count is based on the day of the week:

https://medium.com/media/4c921976dae5870c9ec098bafdc26459

This one teaches something non-trivial: arrays and how they work, plus working with dates.

(Feel free to skip this if you understand how it works.)

  • We create an array with “Const Days = [‘B’, ‘C’, ‘D’, ‘E’, ‘F’]”.
    If you’re a beginner, you can think of this like an ordered list, where those letters will always be in that order, and be accessible by their position, with the first being in position number 0.
    For example, Days[1] will refer to ‘C’, while Days[0] to ‘B’.
  • Note that these are meant to refer to cells, which in the so-called A1 notation are described by a letter for the column, and a number for the row. The columns in these cases referred to days of the week, with B being Monday, and F being Friday (by chance.)
    So how to let the script know where to write, based on what day it is?
  • By playing with the array index and the “getDay()” function, which gives a number depending on the day, from 0 for Sunday to 6 for Saturday.
    Therefore, if we want Monday to be associated to letter B from the “Days” array above, all we need to do is subtract 1 from the getDay() value: like in the line “const Day = Days[Today.getDay()-1];”
  • The next line will get a Range (a group of cells, in this case just one cell) by associating the letter we stored as “Day” to a number of the row we need (in my case it was 50), using the easy way JavaScript gives to build a string from variables, chaining them with the + sign, like in “getRange(Day+50)”.
    Like this it will be B50 for Monday, C50 for Tuesday etc.

Enough with the basics, what about useful things?

This stuff was useful for a stats sheet I had, but quickly became outdated when I discovered how much more could be done with Google Apps Script.
For example, why writing the statistics myself, adding stuff manually, when I can just calculate statistics automatically?

Here the code becomes quite long, and I can’t show you the sheet it works on since it’s where we keep all of our recruitment stuff, but here is a snippet from a function that classifies the daily work done in various arrays, perfect for compiling automated statistics:

https://medium.com/media/8210698494322a092e17e476c8c7cb18

There would be quite a lot to write about this, but in short, you can see the following interesting bits:

  • You can use sheets to store settings which you use to process other sheets! This is much easier than using databases, and with some setup work, it lets you change what a script does without touching its code anymore.
  • If you organize your sheet with some default values that always start in the same way, like in this case my “statuses” starting with numbers or letters, you can check their values easily and quickly by looking only at the first character of the cell value, instead of the whole cell.
  • Using the push() method of arrays is great for counting things and saving them in lists at the same time.
  • Using arrays instead of separate variables for concepts like contacts that are new, to-be-contacted (TBC in the script) and contacted, I am then able to procedurally go through them by cycling through their index.
    It makes the code harder to read, but if you comment it properly, you can have the best of both worlds: the speed and brevity of doing things procedurally in nested loops, and code that can be easily explained.

All this stuff can be used to generate statistics either as popup messages, or full-on sheets:

https://medium.com/media/440022e07990ffde267a55351321a3de

Here I use the same bidimensional array to build a report that just gets displayed on screen via the alert() function, and copied to a sheet, since the setValues() function takes a bidimensional array.

I was also surprised by how the usage of a few emojis can make both the code and the result more readable and clear. Notice for example how I create toasts (popup notifications within the application) showing the time elapsed and what the script is doing, by showing a progressively later time in the clock emoji.

On an empty day like this Sunday, this generates all 0s, but here is the result for you to get an idea:

As you can see, here a lot of titles of skills/positions are mentioned which were not on the script, because they come from a settings sheet:

It’s not pretty because I keep it hidden, and generate it from another script, and another document: you can make your sheets talk to each other this way!

But what about the HTML and CSS? Where’s the UI?

Well, at some point I made something like this:

It’s HTML with simple CSS, and it talks with Google Apps Script, and is done from the same Google Apps Script interface.

It seems trivial stuff, but when combined with the checkboxes I added to the sheet, it could save me a lot of time (especially in cases in which it needed to calculate a specific reminder for some actions, which I was previously doing manually.)

However, I realized it was much slower to display such a window to start my functions, compared to having them in a custom Google Sheets menu which you can easily build!

A lot can be done this way, but still, nothing compared to what can be done with time-driven triggers to fully automate some tasks, or what I managed to do writing my own Chrome Extension for internal use, which interacts two-ways with my Google Sheets, without even asking me anything.
As mentioned before, I think the best UI for productivity is no UI at all..!

Next time I will dig into these topics, showing also how to step up the game by using a standard development environment instead of Google’s nice but limited interface, and switching to TypeScript.

Now it feels like coding for real! 😀

If you were not into coding and you have questions about the code snippets or how to do other things, please feel free to ask in the comments, or directly to me, maybe by contacting me on LinkedIn.

No such thing as a stupid question, and by explaining to me what you’re trying to achieve, you might also give me a good idea for a new script or app.

In any case, whether you felt like this was trivial or difficult, I totally recommend starting to try this stuff yourself: it taught me many things, increased my productivity, and even improved my logical thinking in general!

Also do not hesitate to join our FusionWorks team to work together, we have lots of brilliant products in the pipeline.

Until the next time then, happy coding!

P.S.: A bit different from what I had foreshadowed, here is part 2 of this series!

Check the sequel! Recruiter Codes Chrome Extension — A LinkedIn Fairy(-tale)!

Best practices for user authentication

User authentication is a functionality every web app shares. It should have been perfected a long time ago, considering the number of times it has been implemented. However, there are so many mistakes made and vulnerabilities that haven’t been patched yet.

Below is a list of best practices regarding user authentication. This list will cover as many of the related pitfalls as possible. So here is the list of things that will make your backend authentication mechanisms much more secure:

  • Always use TLS. Every web API should use TLS (Transport Layer Security). TLS protects the information your API sends (and the information that users send to your API) by encrypting your messages while they’re in transit. In case you don’t use TLS on your website, a third party could intercept and read sensitive data that is in the process of transfer.
  • Use bcrypt/scrypt/PBKDF2 for password storing. Don’t use MD5 or SHA, as they are not the best for password storage. Long salt (per user) is mandatory (the aforementioned algorithms have it built-in).
  • Use API keys to give existing users programmatic access. While your REST endpoints can serve your own website, a big advantage of REST is that it provides a standard way for other programs to interact with your service. To keep things simple, don’t make your users do OAuth2 locally or make them provide a username/password combo — that would defeat the point of having used OAuth2 for authentication in the first place. Instead, keep things simple for yourself and your users, and issue API keys.
  • Use CSRF protection (e.g. CSRF one-time tokens that are verified with each request). Frameworks have such functionality built-in.
  • Set session expirations to avoid having forever-lasting sessions. Upon closing the website tab — the user’s session should expire.
  • Limit login attempts.
  • Don’t allow attackers to figure out if an email is registered or not by leaking information through error messages.
  • Forgotten password flow. Send one-time (or expiring) links to users when trying to reset a password.
  • Use the secure option for cookies. It will tell the browser to send cookies over SSL/TLS connections.
  • Don’t leak information through error messages — you shouldn’t allow attackers to figure out if an email is registered or not. If an email is not found, upon login, just report “Incorrect credentials.” On password resets, it may be something like “If your email is registered, you should have received a password reset email.” This is often at odds with usability — people don’t often remember the email they used to register, and the ability to check a number of them before getting in might be important. So this rule is not absolute, though it’s desirable, especially for more critical systems.

References

Set up a smarter routing system with Node.js and Express

At FusionWorks we enjoy using the NestJS framework, which helps a lot to keep our Node.js code clean and well-structured thanks to its modular architecture. But what if you only have Express at a hand? Could we achieve something similar?

Image generated by MidJourney AI for “developer creates smart routing system using Express”

In this tutorial, we’ll set up routes with “dummy” handler functions. On completion, we’ll have a modular structure for our route handling code, which we can extend with real handler functions. We’ll also have a really good understanding of how to create modular routes using Express!

Let’s say we have three routes (//home/about), each one with two different HTTP verbs (getpost).

Our goal here is to create a separate file for each path and make use of the Express Router object. The Router object is a collection of middleware and routes. It is a mini-app within the main app. It can only perform middleware and routing functions and can’t stand on its own.

// routes/root.js
const express = require("express");
const router = express.Router();
router
.route("/")
.get((req, res) => res.send("getting /"))
.post((req, res) => res.send("posting /"));

module.exports = router;// routes/home.js
const express = require("express");
const router = express.Router();
router
.route("/home")
.get((req, res) => res.send("get /home"))
.post((req, res) => res.send("post /home"));

module.exports = router;// routes/about.js
const express = require("express");
const router = express.Router();
router
.route("/about")
.get((req, res) => res.send("get /about"))
.post((req, res) => res.send("post /about"));

module.exports = router;// index.js
const express = require("express");
const app = express();
const port = 3000;
app.use("/", require("./routes/root"));
app.use("/home", require("./routes/home"));
app.use("/about", require("./routes/about"));
app.listen(port, () =>
console.log(`App listening at http://localhost:${port}`)
);

By having each route in its own file, we’re achieving a less messy code in the index.js file. The problem that still persists here — is that every time we add a new route file, the main file has to change as well, in order to map the path to the file.

Achieving a greater number of routes — produces the same problem: the main file gets bigger and messier.

We can solve this issue by creating a separate file that maps all the other routes and making use of it inside the main file.

// routes/index.js
module.exports = (app) => {
app.use("/", require("./root"));
app.use("/home", require("./home"));
app.use("/about", require("./about"));
};// index.js
const express = require("express");
const app = express();
const port = 3000;
const bootstrapRoutes = require("./routes");
bootstrapRoutes(app);
app.listen(port, () =>
console.log(`App listening at http://localhost:${port}`)
);

The routes folder index file receives the app instance from the main file and makes the path mapping. Now we have a cleaner main file, but we still have the problem that it’s required to manually map each path to its file.

This can be improved if we would loop through the routes folder’s files and map them accordingly. We’ll be using the filesystem readdirSync method. This method is used to synchronously read the contents of a given directory,
it also returns an array with all the file names or objects in the directory.

// routes/index.js
const snakeCase = require("lodash/snakeCase");
const express = require("express");
const fs = require("fs");
const path = require("path");module.exports = (app) => {
const files = fs.readdirSync(__dirname);
files.forEach(file => {
if (file === "index.js") {
return;
} const filePath =
file !== "root.js"
? file.replace(".js", "")
: "";

const router = express.Router(); const currentRoute =
require(path.join(__dirname, `/${filePath}`))(router); app.use(`/${filePath}`, currentRoute);
});
}

References

NestJS + JWT: complete hands-on guide

In today’s article, we’ll be building a complete JWT-based authentication module with logout and refresh functionality. Also, we’ll get through on how to use access and refresh tokens with PassportJS and NestJS. But first, let’s understand how this mechanism works in theory.

Image generated by MidJourney AI for “json web token protects API from hacker attacks”
  1. After authenticating successfully the client receives an access token.
  2. The client provides an access token every time a protected request is made.
  3. The server checks if the access token is valid, if so, the request is successful.
  4. In case the token is invalid, the server throws an error about the token being invalid.
  5. The client is able to get a new access token by authenticating or refreshing it by presenting the refresh token.
  6. The server validates the refresh token, and if valid, issues a new access token and a refresh token.

Now that we have a solid grasp of how the mechanism works in theory, let’s try and put it into practice.

Prerequisites

In this guide we’ll be using regular REST for our endpoints and Prisma as our ORM system, we’re also gonna need a hashing library to hash users’ passwords and tokens — we’ll be using bcrypt.
For our authentication strategy, we’re gonna install nestjs/jwt and passport-jwt.

We won’t cover the project setup or the Prisma & JWT setup, since this is not the purpose of our today’s article. You could check the respective NestJS documentation if you need more details on this:

Once done with the basics let’s dive in by setting up our authentication controller:

And the authentication service should look like this:

Now let’s add our first method in our auth.service.ts to retrieve a user’s tokens, use env variables for the expiresIn field, the refreshToken expiration time is usually about a week and the accessToken expiration time should be about 15 minutes.

Let’s also add a method that will update a user’s hashedRefreshToken field, see more here.

Let’s implement the login functionality inside auth.service.ts, we’ll be using the above-implemented methods, signTokens and updateRefreshToken:

So what happens here is — that on each login, we supply the client with fresh tokens and update the current user’s state with a hashed token which will be used in the future to refresh both the refresh token and the access token.

Let’s implement both the logout and refresh methods, the logout method will delete the user’s stored hashed token and the refresh method will compare if the issued token matches the one stored inside the user, if that’s the case — it will issue the client a pair of fresh tokens.

Let’s move on to our auth.service.ts, see more here.

Pay attention that our logout and refresh method — received userId as a parameter, we’re not gonna pass that parameter inside the body of our request but rather get it from the JWT of the current user — we’ll achieve that by implementing both strategy and guard functionality (we’ll use the @nestjs/passport AuthGuard for now), it will help to manage the state of the authenticated users (by issuing JWT tokens in our case and verifying their credentials).

We’ll need 2 different strategies, one for accessing all the endpoints and one for our refresh endpoint.

The first strategy will decode the JWT from the request, by setting the ignoreExpiration to false — it will also check its expiration and send it back through the AuthGuard, so we’ll be able to access it from the Req() decorator (by default under the property user).

By setting the passReqToCallback to true inside the second strategy, we have access to the request object inside the validate method, the “refresh strategy” will take the refresh token from the authorization header and send it to the controller through the AuthGuard.

Let’s proceed by implementing our strategies first:

Now, let’s update our logout and refresh endpoints inside the auth.controller.ts, we’ll pass our newly created strategies to the AuthGuard which will be passed inside the @UseGuards decorator, thus our endpoints will be secured accordingly, that way we’ll create a connection between our endpoint and the created strategy and we’ll have access to the request object that is fulfilled with JWT data inside the strategy.

So let’s go once again through what’s happening really:

  • The logout endpoint is secured by our guard that implements the jwt strategy, thus it can be accessed only if the client provides a valid access token, if the access token is invalid — the refresh endpoint should be called by the client to ask for a new pair of tokens.
  • The refresh endpoint has one important job — to let the client refresh his tokens (without having to log in again), in case the provided refresh token by the client is not valid, the user is forbidden from accessing any endpoints other than login (in the case of our guide).

So now — we have our own refresh token mechanism implementation. This can be improved of course by creating a custom AuthGuard for both of our cases (access and refresh), we may also create a custom decorator that will return the JWT data from the ExecutionContext, instead of accessing the Req() decorator.

Bonus

Since we have the backend implementation, let’s try to go through the frontend part.
As was mentioned before, the client asks for new tokens using the refresh token as soon as the main token expires, and then the client needs to use the freshly retrieved token for the next API call until that token expires as well. You can send the request for new tokens after the first 403 response for example, but in this guide, we’ll be using Axios for the HTTP requests and Redis to store the tokens:

The response interceptor verifies if the server returned a status code that shows an access token has expired. If that’s the case, a function that refreshes the access token gets called. That function returns and stores the tokens in Redis.

References

Using WebView with Flutter

Flutter comes with many packages that grant the possibility to run web pages in mobile apps. After research, the leader has been found: webview_flutter.

Why webview_flutter

First of all, it is currently maintained by the official Google team. When the development started, there were no official recommendations by the Flutter team on which package should be used.

The second reason why we started using this package — it gives us the possibility to share data between the web page and the flutter application in two directions. This feature was crucial when we investigated this case.

On iOS the WebView widget is backed by a WKWebView; On Android, the WebView widget is backed by a WebView.

How to set up webview_flutter

The setup is pretty easy. Install the library using the description on the official page.

How do we get data from a web page

JavascriptChannel gives us possibility to get data from web page. We set it up in the javascriptChannels list:

Firstly we choose the name for the channel in name parameter. This name will used to access channel from inside the web page. When we call the method channel from the page, onMessageReceived will be fired transporting the message.

Now let’s see how to send messages from the page. Firstly, webview_flutter mutates window object. If a web page has been loaded using this package it will have a property that we have defined in JavascriptChannel. In our case we can access the channel by calling:

We can use postMessage method to pass data to WebView and trigger onMessageReceived callback.

If you should use TypeScript in your project, you will need to override Window interface using following syntax:

How to pass data to a web page

In this particular case, we can not just use a JavascriptChannel, we should somehow inject some JavaScript code that will fire messages inside the web page. In this case, the web page will have a subscriber that will process the data received from the app.

The packagewebview_flutter comes with a solution. We can use WebViewController class that has runJavascript(String script) method:

Once script is executed and message is fired, a callback from inside the page is triggered:

Summary

In this article we have successfully transported data from a web page into Flutter application using WebView and vice versa.

Web page:

Send data: window.WebViewUserLocation.postMessage('');

Receive data: window.addEventListener('message', onMessageReceived);

Flutter app:

Send data:

Receive data: use JavascriptChannel

How to prevent sensitive data leakages through code repositories

Version control software (VCS) is essential for most modern software development practices. Among other benefits, software like Git, Mercurial, Bazaar, Perforce, CVS, and Subversion allows developers to save snapshots of their project history to enable better collaboration, revert to previous states, recover from unintended code changes, and manage multiple versions of the same codebase. These tools allow multiple developers to safely work on the same project and provide significant benefits even if you do not plan to share your work with others.

Although it is important to save your code in source control, it is equally important for some project assets to be kept out of your repository. Certain data like binary blobs and configuration files are best left out of source control for performance and usability reasons. But more importantly, sensitive data like passwords, secrets, and private keys should never be checked into a repository unprotected for security reasons.

Checking your Git Repository for Sensitive Data

First of all, once you started managing your secret security you need to check the repository for certain data. If you know an exact string that you want to search for, you can try using your VCS tool’s native search function to check whether the provided value is present in any commits. For example, with git, a command like this can search for a specific password:

git grep my_secret $(git rev-list --all)

Setting the security

Once you have removed sensitive data from the repository you should consider setting some internal tools to ensure you did not commit those files.

Ignoring Sensitive Files

The most basic way to keep files with sensitive data out of your repository is to leverage your VCS’s ignore functionality from the very beginning. VCS “ignore” files (like .gitignore) define patterns, directories, or files that should be excluded from the repository. These are a good first line of defense against accidentally exposing data. This strategy is useful because it does not rely on external tooling, the list of excluded items is automatically configured for collaborators, and it is easy to set up.

While VCS ignore functionality is useful as a baseline, it relies on keeping the ignore definitions up-to-date. It is easy to commit sensitive data accidentally prior to updating or implementing the ignore file. Ignore patterns that only have file-level granularity, so you may have to refactor some parts of your project if secrets are mixed in with code or other data that should be committed.

Using VCS Hooks to Check Files Prior to Committing

Most modern VCS implementations include a system called “hooks” for executing scripts before or after certain actions are taken within the repository. This functionality can be used to execute a script to check the contents of pending changes for sensitive material. The previously mentioned git-secrets tool has the ability to install pre-commit hooks that implement automatic checking for the type of content it evaluates. You can add your own custom scripts to check for whatever patterns you’d like to guard against.

Repository hooks provide a much more flexible mechanism for searching for and guarding against the addition of sensitive data at the time of commit. This increased flexibility comes at the cost of having to script all of the behavior you’d like to implement, which can potentially be a difficult process depending on the type of data you want to check. An additional consideration is that hooks are not shared as easily as ignore files, as they are not part of the repository that other developers copy. Each contributor will need to set up the hooks on their own machine, which makes enforcement a more difficult problem.

Adding Files to the Staging Area Explicitly

While more localized in scope, one simple strategy that may help you to be more mindful of your commits is to only add items to the VCS staging area explicitly by name. While adding files by wildcard or expansion can save some time, being intentional about each file you want to add can help prevent accidental additions that might otherwise be included. A beneficial side effect of this is that it generally allows you to create more focused and consistent commits, which helps with many other aspects of collaborative work.

Rules that you need to consider:

  • Never store unencrypted secrets in .git repositories. 
    A secret in a private repo is like a password written on a $20 bill, you might trust the person you gave it to, but that bill can end up in hundreds of peoples hands as a part of multiple transactions and within multiple cash registers.
  • Avoid git add * commands on git. 
    Using wildcard commands like git add *or git add . can easily capture files that should not enter a git repository, this includes generated files, config files and temporary source code. Add each file by name when making a commit and use git status to list tracked and untracked files.
  • Don’t rely on code reviews to discover secrets.
    It is extremely important to understand that code reviews will not always detect secrets, especially if they are hidden in previous versions of code. The reason code reviews are not adequate protection is because reviewers are only concerned with the difference between current and proposed states of the code, they do not consider the entire history of the project.
  • Use local environment variables, when feasible.
    An environment variable is a dynamic object whose value is set outside of the application. This makes them easier to rotate without having to make changes within the application itself. It also removes the need to have these written within source code, making them more appropriate to handle sensitive data.
  • Use automated secrets scanning on repositories.
    Implement real time alerting on repositories and gain visibility over where your secrets are with tools like GitGuardian