Typesense is an open source, typo tolerant search engine that is optimized for instant sub-50ms searches, while providing an intuitive developer experience.
This is a list of (Fuzzy) Data Matching software. The software in this list is open source and/or freely available.
The term data matching is used to indicate the procedure of bringing together information from two or more records that are believed to belong to the same entity. Data matching has two applications: (1) to match data across multiple datasets (linkage) and (2) to match data within a dataset (deduplication). See the Wikipedia page about data matching for more information.
Similar terms: record linkage, data matching, deduplication, fuzzy matching, entity resolution
Suppose we want to combine a BERT-based named entity recognition (NER) model with a rule-based NER model built on top of spaCy. Although BERT's NER exhibits extremely high performance, it is usually combined with rule-based approaches for practical purposes. In such cases, what often bothers us is that tokens of spaCy and BERT are different, even if the input sentences are the same. For example, let's say the input sentence is "John Johanson 's house"; BERT tokenizes this sentence like
["john", "johan", "##son", "'", "s", "house"]and spaCy tokenizes it like
["John", "Johanson", "'s", "house"]. To combine the outputs, we need to calculate the correspondence between the two different token sequences. This correspondence is the "alignment".
h-feed is a simple, open format for publishing a stream or feed of h-entry posts, like complete posts on a home page or archive pages, or summaries or other brief lists of posts. h-feed is one of several open microformat draft standards suitable for embedding data in HTML.
We make very careful considerations about the interface and operation of the GNU coreutils, but unfortunately due to backwards compatibility reasons, some behaviours or defaults of these utilities can be confusing.
This information will continue to be updated and overlaps somewhat with the coreutils FAQ, with this list focusing on less frequent potential issues.
The Configuration as Code plugin is an opinionated way to configure Jenkins based on human-readable declarative configuration files. Writing such a file should be feasible without being a Jenkins expert, just translating into code a configuration process one is used to executing in the web UI.
HTTP is fundamental to modern development, from frontend to backend to mobile. But like any widespread mature standard, it's got some funky skeletons in the closet.
Some of these skeletons are little-known but genuinely useful features, some of them are legacy oddities relied on by billions of connections daily, and some of them really shouldn't exist at all. Let's look behind the curtain:
In this article I will detail caching strategies that don't give up correctness to gain speed. These strategies all do away with TTLs and instead use active invalidation.
TinyLetter is a personal newsletter service brought to you by the people behind Mailchimp. People use it to send updates, digests, and dispatches to their fans and friends.
Though they're built on the same infrastructure, TinyLetter is for people who don't need all the business features that come along with Mailchimp. Simplicity is at the heart of everything we do at TinyLetter.
TinyLetter is a completely free service.
WHIP is still in development and is not recommended for production use. Much is still missing or untested but for basic API use cases however, the performance numbers look pretty good so far.