It is priceless to be up-to-date and at the same time, it’s challenging due to the massive amount of information generated per second. Currently, you should look for information via search engines, for instance, Google Search
. I would like the information is finding me!
Google Alerts is awesome. The idea to be notified if something has changed for your query is brilliant. The same as Kafka SQL does it - you write a query, it notifies on every new change.
Google Alerts sucks. The method of notification is outdated. The emails, really!? They are so inconvenient comparing to the messenger apps. Huge delay between a real data change and alert itself due to crawler busyness that cannot index the internet more frequently. Not enough search parameters if it comes to a specific area, for instance, housing, renting, cars.
Problem
Let’s imagine, you are looking for a flat to buy in Poland. There is otodom site for that matter, so it is not a big deal to search for offers you are interested in. However, found results are static and you have to repeat the search again and again wasting time. I would like that new offers are finding me and not other way around. Google Alert
will you do that? Of course, if a week delay is not a problem.
(╯°□°)╯︵ ┻━┻
Solution
Consider what bare minimum implementation will look like. So that can be a starting point to improve in the future if needed.
I will get notifications via Telegram Messenger moreover it will store notification history. Luckily, it is supported by popular .NET
loggers. I checked NLog
, log4net
, Serilog
meaning I don’t have to learn yet another API for Telegram
. If you use another messenger I’m pretty sure it is supported by loggers as well as Telegram
, for instance, take Slack
and Serilog
Serilog.Sinks.Slack.
Here how sending a message to Telegram
via Serilog
logger looks like.
var logger = new LoggerConfiguration()
.WriteTo.Telegram(config.TelegramApiKey, config.TelegramChatId)
.CreateLogger();
logger.Information("Offer #1 - https://the.internet/offer-1.html");
Since otodom
does not have any API, HTML page will be my API. HtmlAgilityPack will help me to parse it.
public static class OtoDomCrawler
{
public static async Task<List<Page>> Search(string url)
{
var offers = new List<Page>();
while (url != null)
{
var html = (await new HtmlWeb().LoadFromWebAsync(url)).DocumentNode;
offers.AddRange(GetOffers(html));
url = html.SelectSingleNode("//a[@data-dir='next']")?.GetAttributeValue("href", null);
}
return offers;
}
private static IEnumerable<Page> GetOffers(HtmlNode html)
{
foreach (var offer in html.SelectNodes("//*[@class='offer-item-details']"))
{
var title = offer.SelectSingleNode(".//*[@class='offer-item-title']");
var link = title?.AncestorsAndSelf("a").FirstOrDefault();
yield return new Page
{
Link = link?.GetAttributeValue("href", null)?.Split("#")?.FirstOrDefault(),
Title = title?.InnerText
};
}
}
}
In order to store already shown offers, I will use Sqlite
and Entity Framework
because it is a low-ceremony way I know.
public class AlertsDatabase : DbContext
{
public DbSet<Page> Pages { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder opt) =>
opt.UseSqlite("Data Source=Alerts.sqlite3");
}
public class Page
{
[Key] public string Link { get; set; }
public string Title { get; set; }
public override bool Equals(object compared) => string.Equals(Link, ((Page) compared).Link);
public override int GetHashCode() => Link.GetHashCode();
}
async Task EntityFramework_and_Sqlite_example()
{
using (var db = new AlertsDatabase())
{
await db.Database.EnsureCreatedAsync();
db.Pages.Add(new Page
{
Link = "https://the.internet/offer-1.html",
Title = "Offer #1"
});
await db.SaveChangesAsync();
}
}
.NET Core
is chosen to be able to run a console app on Linux
, so I can run on my Linux
on the cloud, but it does not really matter, Windows
will go the same well. The deploy script as simple as below.
dotnet publish --runtime ubuntu.16.04-x64 --configuration Release
scp -r bin/Release/netcoreapp2.2/ubuntu.16.04-x64/publish/ root@__.___.___.___:/apps/MyAlerts/
ssh root@__.___.___.___
chmod 777 /apps/MyAlerts/Gaev.Blog.Examples.GoogleAlert
Because Linux
is used why not to schedule the app via cron
, and for Windows
, you can use built-in Task Scheduler
. I will use */30 6-23 * * * cron schedule expression which means At every 30th minute past every hour from 6 through 23
.
crontab -e
# configure cron like this
*/30 6-23 * * * /apps/MyAlerts/Gaev.Blog.Examples.GoogleAlert
As a result, it took ~100 lines of code. Good start, considering, it works and gives a value!
public static async Task Main(string[] args)
{
var config = new ConfigurationBuilder()
.AddJsonFile("appsettings.json")
.Build()
.Get<Config>();
using (var logger = new LoggerConfiguration()
.WriteTo.Telegram(config.TelegramApiKey, config.TelegramChatId)
.CreateLogger())
try
{
using (var db = new AlertsDatabase())
await Sync(logger, db);
}
catch (Exception ex)
{
logger.Error(ex, "Oops :(");
}
}
private static async Task Sync(Logger logger, AlertsDatabase db)
{
await db.Database.EnsureCreatedAsync();
var foundPages = await OtoDomCrawler.Search(SpecificOtoDomSearchUrl);
var knownPages = await db.Pages.ToListAsync();
var newPages = foundPages.Except(knownPages).ToList();
db.Pages.AddRange(newPages);
await db.SaveChangesAsync();
foreach (var page in newPages)
logger.Information($"[{page.Title}]({page.Link})");
}
Where SpecificOtoDomSearchUrl
is my super custom filter of flats configuration that I would like to find, e.g. 3 rooms, in Krakow, price range between X to Y.
If you are searching for a flat, car, ticket, job, look how easy to be the first. Information finds you!
Do you want this but not a technical guy? No problem contact me (LinkedIn, Twitter) and I will help.
Here you can find complete example Gaev.Blog.Examples.GoogleAlert.
Pitfalls
Along the way I came across a few problems and considerations:
- It is a bit tricky how to setup Telegram bot.
- If you want to show the alerts to several people (wife, your agency, friends) it is possible to write to Telegram channels.
- Search result should be small enough. Capabilities of one console app are limited.