Bulletin was an online newsletter platform launched by Facebook on July 6, 2021, that allows notable writers to make announcements directly to their subscribers. Its competitors included Substack, of which Bulletin was called a "near-clone." Writers participating in the platform's launch included Malcolm Gladwell, Mitch Albom, Tan France, Jessica Yellin, Jane Wells, Erin Andrews and Dorie Greenspan. Facebook CEO Mark Zuckerberg stated that Bulletin represented the first time that the company had "built a project that is directly for journalists and individual writers." In October 2022 Meta announced the shutdown of Bulletin. The platform went into read only mode in January 2023 and became unavailable in April 2023. == History == Facebook announced Bulletin as its online newsletter platform on June 29, 2021. and launched by the company on July 6, 2021. Facebook CEO Mark Zuckerberg touted the service by saying that Bulletin represented the first time that the company had "built a project that is directly for journalists and individual writers." Writers participating in the platform's launch included Malcolm Gladwell, Mitch Albom, Tan France, Jessica Yellin, Jane Wells, Erin Andrews and Dorie Greenspan. == Reception == Unlike competitor such as Substack, Facebook indicated upon service's launch that it would not take a cut of subscription fees of writers using that platform. According to Washington Post technology writer Will Oremus, the move was criticized by those who viewed it as a form of predatory pricing intended by Facebook to force those competitors out of business. Sandeep Vaheesan, legal director of the think tank Open Markets, called for the government to reexamine predatory pricing as a violation of antitrust law, saying, "We want companies to compete by making better products, investing in new equipment and tech — not purely relying on their financial advantages to capture market share."
Computational heuristic intelligence
Computational heuristic intelligence (CHI) refers to specialized programming techniques in computational intelligence (also called artificial intelligence, or AI). These techniques have the express goal of avoiding complexity issues, also called NP-hard problems, by using human-like techniques. They are best summarized as the use of exemplar-based methods (heuristics), rather than rule-based methods (algorithms). Hence the term is distinct from the more conventional computational algorithmic intelligence, or symbolic AI. An example of a CHI technique is the encoding specificity principle of Tulving and Thompson. In general, CHI principles are problem solving techniques used by people, rather than programmed into machines. It is by drawing attention to this key distinction that the use of this term is justified in a field already replete with confusing neologisms. Note that the legal systems of all modern human societies employ both heuristics (generalisations of cases) from individual trial records as well as legislated statutes (rules) as regulatory guides. Another recent approach to the avoidance of complexity issues is to employ feedback control rather than feedforward modeling as a problem-solving paradigm. This approach has been called computational cybernetics, because (a) the term 'computational' is associated with conventional computer programming techniques which represent a strategic, compiled, or feedforward model of the problem, and (b) the term 'cybernetic' is associated with conventional system operation techniques which represent a tactical, interpreted, or feedback model of the problem. Of course, real programs and real problems both contain both feedforward and feedback components. A real example which illustrates this point is that of human cognition, which clearly involves both perceptual (bottom-up, feedback, sensor-oriented) and conceptual (top-down, feedforward, motor-oriented) information flows and hierarchies. The AI engineer must choose between mathematical and cybernetic problem solution and machine design paradigms. This is not a coding (program language) issue, but relates to understanding the relationship between the declarative and procedural programming paradigms. The vast majority of STEM professionals never get the opportunity to design or implement pure cybernetic solutions. When pushed, most responders will dismiss the importance of any difference by saying that all code can be reduced to a mathematical model anyway. Unfortunately, not only is this belief false, it fails most spectacularly in many AI scenarios. Mathematical models are not time agnostic, but by their very nature are pre-computed, i.e. feedforward. Dyer [2012] and Feldman [2004] have independently investigated the simplest of all somatic governance paradigms, namely control of a simple jointed limb by a single flexor muscle. They found that it is impossible to determine forces from limb positions- therefore, the problem cannot have a pre-computed (feedforward) mathematical solution. Instead, a top-down command bias signal changes the threshold feedback level in the sensorimotor loop, e.g. the loop formed by the afferent and efferent nerves, thus changing the so-called ‘equilibrium point’ of the flexor muscle/ elbow joint system. An overview of the arrangement reveals that global postures and limb position are commanded in feedforward terms, using global displacements (common coding), with the forces needed being computed locally by feedback loops. This method of sensorimotor unit governance, which is based upon what Anatol Feldman calls the ‘equilibrium Point’ theory, is formally equivalent to a servomechanism such as a car's ‘cruise control’.
Frequent pattern discovery
Frequent pattern discovery (or FP discovery, FP mining, or Frequent itemset mining) is part of knowledge discovery in databases, Massive Online Analysis, and data mining; it describes the task of finding the most frequent and relevant patterns in large datasets. The concept was first introduced for mining transaction databases. Frequent patterns are defined as subsets (itemsets, subsequences, or substructures) that appear in a data set with frequency no less than a user-specified or auto-determined threshold. == Techniques == Techniques for FP mining include: market basket analysis cross-marketing catalog design clustering classification recommendation systems For the most part, FP discovery can be done using association rule learning with particular algorithms Eclat, FP-growth and the Apriori algorithm. Other strategies include: Frequent subtree mining Structure mining Sequential pattern mining and respective specific techniques. Implementations exist for various machine learning systems or modules like MLlib for Apache Spark.
ID3 algorithm
In decision tree learning, ID3 (Iterative Dichotomiser 3) is a greedy algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm. The 3 in the name is meant to signify that this was Quinlan's third attempt at a model based on entropy-based splitting, and the term dichotimser is a misnomer as it implies a binary split, but the ID3 algorithm can split on multi-valued attributes. == Algorithm == The ID3 algorithm begins with the original set S {\displaystyle S} as the root node. On each iteration of the algorithm, it iterates through every unused attribute of the set S {\displaystyle S} and calculates the entropy H ( S ) {\displaystyle \mathrm {H} {(S)}} or the information gain I G ( S ) {\displaystyle IG(S)} of that attribute. It then selects the attribute which has the smallest entropy (or largest information gain) value. The set S {\displaystyle S} is then split or partitioned by the selected attribute to produce subsets of the data. (For example, a node can be split into child nodes based upon the subsets of the population whose ages are less than 50, between 50 and 100, and greater than 100.) The algorithm continues to recurse on each subset, considering only attributes never selected before. Recursion on a subset may stop in one of these cases: every element in the subset belongs to the same class; in which case the node is turned into a leaf node and labelled with the class of the examples. there are no more attributes to be selected, but the examples still do not belong to the same class. In this case, the node is made a leaf node and labelled with the most common class of the examples in the subset. there are no examples in the subset, which happens when no example in the parent set was found to match a specific value of the selected attribute. An example could be the absence of a person among the population with age over 100 years. Then a leaf node is created and labelled with the most common class of the examples in the parent node's set. Throughout the algorithm, the decision tree is constructed with each non-terminal node (internal node) representing the selected attribute on which the data was split, and terminal nodes (leaf nodes) representing the class label of the final subset of this branch. === Summary === Calculate the entropy of every attribute a {\displaystyle a} of the data set S {\displaystyle S} . Partition ("split") the set S {\displaystyle S} into subsets using the attribute for which the resulting entropy after splitting is minimized; or, equivalently, information gain is maximum. Make a decision tree node containing that attribute. Recurse on subsets using the remaining attributes. === Properties === ID3 does not guarantee an optimal solution. It can converge upon local optima. It uses a greedy strategy by selecting the locally best attribute to split the dataset on each iteration. The algorithm's optimality can be improved by using backtracking during the search for the optimal decision tree at the cost of possibly taking longer. ID3 can overfit the training data. To avoid overfitting, smaller decision trees should be preferred over larger ones. This algorithm usually produces small trees, but it does not always produce the smallest possible decision tree. ID3 is harder to use on continuous data than on factored data (factored data has a discrete number of possible values, thus reducing the possible branch points). If the values of any given attribute are continuous, then there are many more places to split the data on this attribute, and searching for the best value to split by can be time-consuming. === Usage === The ID3 algorithm is used by training on a data set S {\displaystyle S} to produce a decision tree which is stored in memory. At runtime, this decision tree is used to classify new test cases (feature vectors) by traversing the decision tree using the features of the datum to arrive at a leaf node. == The ID3 metrics == === Entropy === Entropy H ( S ) {\displaystyle \mathrm {H} {(S)}} is a measure of the amount of uncertainty in the (data) set S {\displaystyle S} (i.e. entropy characterizes the (data) set S {\displaystyle S} ). H ( S ) = ∑ x ∈ X − p ( x ) log 2 p ( x ) {\displaystyle \mathrm {H} {(S)}=\sum _{x\in X}{-p(x)\log _{2}p(x)}} Where, S {\displaystyle S} – The current dataset for which entropy is being calculated This changes at each step of the ID3 algorithm, either to a subset of the previous set in the case of splitting on an attribute or to a "sibling" partition of the parent in case the recursion terminated previously. X {\displaystyle X} – The set of classes in S {\displaystyle S} p ( x ) {\displaystyle p(x)} – The proportion of the number of elements in class x {\displaystyle x} to the number of elements in set S {\displaystyle S} When H ( S ) = 0 {\displaystyle \mathrm {H} {(S)}=0} , the set S {\displaystyle S} is perfectly classified (i.e. all elements in S {\displaystyle S} are of the same class). In ID3, entropy is calculated for each remaining attribute. The attribute with the smallest entropy is used to split the set S {\displaystyle S} on this iteration. Entropy in information theory measures how much information is expected to be gained upon measuring a random variable; as such, it can also be used to quantify the amount to which the distribution of the quantity's values is unknown. A constant quantity has zero entropy, as its distribution is perfectly known. In contrast, a uniformly distributed random variable (discretely or continuously uniform) maximizes entropy. Therefore, the greater the entropy at a node, the less information is known about the classification of data at this stage of the tree; and therefore, the greater the potential to improve the classification here. As such, ID3 is a greedy heuristic performing a best-first search for locally optimal entropy values. Its accuracy can be improved by preprocessing the data. === Information gain === Information gain I G ( A ) {\displaystyle IG(A)} is the measure of the difference in entropy from before to after the set S {\displaystyle S} is split on an attribute A {\displaystyle A} . In other words, how much uncertainty in S {\displaystyle S} was reduced after splitting set S {\displaystyle S} on attribute A {\displaystyle A} . I G ( S , A ) = H ( S ) − ∑ t ∈ T p ( t ) H ( t ) = H ( S ) − H ( S | A ) . {\displaystyle IG(S,A)=\mathrm {H} {(S)}-\sum _{t\in T}p(t)\mathrm {H} {(t)}=\mathrm {H} {(S)}-\mathrm {H} {(S|A)}.} Where, H ( S ) {\displaystyle \mathrm {H} (S)} – Entropy of set S {\displaystyle S} T {\displaystyle T} – The subsets created from splitting set S {\displaystyle S} by attribute A {\displaystyle A} such that S = ⋃ t ∈ T t {\displaystyle S=\bigcup _{t\in T}t} p ( t ) {\displaystyle p(t)} – The proportion of the number of elements in t {\displaystyle t} to the number of elements in set S {\displaystyle S} H ( t ) {\displaystyle \mathrm {H} (t)} – Entropy of subset t {\displaystyle t} In ID3, information gain can be calculated (instead of entropy) for each remaining attribute. The attribute with the largest information gain is used to split the set S {\displaystyle S} on this iteration.
Nearest neighbor search
Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values. Formally, the nearest neighbor (NN) search problem is defined as follows: given a set S of points in a space M and a query point q ∈ M {\displaystyle q\in M} , find the closest point in S to q. Donald Knuth in volume 3 of The Art of Computer Programming (1973) called it the post-office problem, referring to an application of assigning to a residence the nearest post office. A direct generalization of this problem is a k-NN search, where we need to find the k closest points. Most commonly M is a metric space and dissimilarity is expressed as a distance metric, which is symmetric and satisfies the triangle inequality. Even more common, M is taken to be the d-dimensional vector space where dissimilarity is measured using the Euclidean distance, Manhattan distance or other distance metric. However, the dissimilarity function can be arbitrary. One example is asymmetric Bregman divergence, for which the triangle inequality does not hold. == Applications == The nearest neighbor search problem arises in numerous fields of application, including: Pattern recognition – in particular for optical character recognition Statistical classification – see k-nearest neighbor algorithm Computer vision – for point cloud registration Computational geometry – see Closest pair of points problem Cryptanalysis – for lattice problem Databases – e.g. content-based image retrieval Coding theory – see maximum likelihood decoding Semantic search Vector databases, where nearest-neighbor lookup over embeddings is used to retrieve semantically similar records Retrieval-augmented generation systems, where nearest-neighbor retrieval over embeddings is used to fetch candidate passages or documents before generation Data compression – see MPEG-2 standard Robotic sensing Recommendation systems, e.g. see Collaborative filtering Internet marketing – see contextual advertising and behavioral targeting DNA sequencing Spell checking – suggesting correct spelling Plagiarism detection Similarity scores for predicting career paths of professional athletes. Cluster analysis – assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense, usually based on Euclidean distance Chemical similarity Sampling-based motion planning == Methods == Various solutions to the NNS problem have been proposed. The quality and usefulness of the algorithms are determined by the time complexity of queries as well as the space complexity of any search data structures that must be maintained. The informal observation usually referred to as the curse of dimensionality states that there is no general-purpose exact solution for NNS in high-dimensional Euclidean space using polynomial preprocessing and polylogarithmic search time. === Exact methods === ==== Linear search ==== The simplest solution to the NNS problem is to compute the distance from the query point to every other point in the database, keeping track of the "best so far". This algorithm, sometimes referred to as the naive approach, has a running time of O(dN), where N is the cardinality of S and d is the dimensionality of S. There are no search data structures to maintain, so the linear search has no space complexity beyond the storage of the database. Naive search can, on average, outperform space partitioning approaches on higher dimensional spaces. The absolute distance is not required for distance comparison, only the relative distance. In geometric coordinate systems the distance calculation can be sped up considerably by omitting the square root calculation from the distance calculation between two coordinates. The distance comparison will still yield identical results. ==== Space partitioning ==== Since the 1970s, the branch and bound methodology has been applied to the problem. In the case of Euclidean space, this approach encompasses spatial index or spatial access methods. Several space-partitioning methods have been developed for solving the NNS problem. Perhaps the simplest is the k-d tree, which iteratively bisects the search space into two regions containing half of the points of the parent region. Queries are performed via traversal of the tree from the root to a leaf by evaluating the query point at each split. Depending on the distance specified in the query, neighboring branches that might contain hits may also need to be evaluated. For constant dimension query time, average complexity is O(log N) in the case of randomly distributed points, worst case complexity is O(kN^(1-1/k)) Alternatively the R-tree data structure was designed to support nearest neighbor search in dynamic context, as it has efficient algorithms for insertions and deletions such as the R tree. R-trees can yield nearest neighbors not only for Euclidean distance, but can also be used with other distances. In the case of general metric space, the branch-and-bound approach is known as the metric tree approach. Particular examples include vp-tree and BK-tree methods. Using a set of points taken from a 3-dimensional space and put into a BSP tree, and given a query point taken from the same space, a possible solution to the problem of finding the nearest point-cloud point to the query point is given in the following description of an algorithm. (Strictly speaking, no such point may exist, because it may not be unique. But in practice, usually we only care about finding any one of the subset of all point-cloud points that exist at the shortest distance to a given query point.) The idea is, for each branching of the tree, guess that the closest point in the cloud resides in the half-space containing the query point. This may not be the case, but it is a good heuristic. After having recursively gone through all the trouble of solving the problem for the guessed half-space, now compare the distance returned by this result with the shortest distance from the query point to the partitioning plane. This latter distance is that between the query point and the closest possible point that could exist in the half-space not searched. If this distance is greater than that returned in the earlier result, then clearly there is no need to search the other half-space. If there is such a need, then you must go through the trouble of solving the problem for the other half space, and then compare its result to the former result, and then return the proper result. The performance of this algorithm is nearer to logarithmic time than linear time when the query point is near the cloud, because as the distance between the query point and the closest point-cloud point nears zero, the algorithm needs only perform a look-up using the query point as a key to get the correct result. === Approximation methods === An approximate nearest neighbor search algorithm is allowed to return points whose distance from the query is at most c {\displaystyle c} times the distance from the query to its nearest points. The appeal of this approach is that, in many cases, an approximate nearest neighbor is almost as good as the exact one. In particular, if the distance measure accurately captures the notion of user quality, then small differences in the distance should not matter. ==== Greedy search in proximity neighborhood graphs ==== Proximity graph methods (such as navigable small world graphs and HNSW) are considered the current state-of-the-art for the approximate nearest neighbors search. The methods are based on greedy traversing in proximity neighborhood graphs G ( V , E ) {\displaystyle G(V,E)} in which every point x i ∈ S {\displaystyle x_{i}\in S} is uniquely associated with vertex v i ∈ V {\displaystyle v_{i}\in V} . The search for the nearest neighbors to a query q in the set S takes the form of searching for the vertex in the graph G ( V , E ) {\displaystyle G(V,E)} . The basic algorithm – greedy search – works as follows: search starts from an enter-point vertex v i ∈ V {\displaystyle v_{i}\in V} by computing the distances from the query q to each vertex of its neighborhood { v j : ( v i , v j ) ∈ E } {\displaystyle \{v_{j}:(v_{i},v_{j})\in E\}} , and then finds a vertex with the minimal distance value. If the distance value between the query and the selected vertex is smaller than the one between the query and the current element, then the algorithm moves to the selected vertex, and it becomes new enter-point. The algorithm stops when it reaches a local minimum: a vertex whose neighborhood does not contain a vertex that is closer to the query than the vertex itself. The idea of proximity neighborhood graphs was exploited in multiple publications, including the seminal paper by Arya and Mount, in the VoroNet syst
Cooperative storage cloud
A cooperative storage cloud is a decentralized model of networked online storage where data is stored on multiple computers (nodes), hosted by the participants cooperating in the cloud. For the cooperative scheme to be viable, the total storage contributed in aggregate must be at least equal to the amount of storage needed by end users. However, some nodes may contribute less storage and some may contribute more. There may be reward models to compensate the nodes contributing more. Unlike a traditional storage cloud, a cooperative does not directly employ dedicated servers for the actual storage of the data, thereby eliminating the need for a significant dedicated hardware investment. Each node in the cooperative runs specialized software which communicates with a centralized control and orchestration server, thereby allowing the node to both consume and contribute storage space to the cloud. The centralized control and orchestration server requires several orders of magnitude less resources (storage, computing power, and bandwidth) to operate, relative to the overall capacity of the cooperative. == Data security == Files hosted in the cloud are fragmented and encrypted before leaving the local machine. They are then distributed randomly using a load balancing and geo-distribution algorithm to other nodes in the cooperative. Users can add an additional layer of security and reduce storage space by compressing and encrypting files before they are copied to the cloud. == Data redundancy == In order to maintain data integrity and high availability across a relatively unreliable set of computers over a wide area network like the Internet, the source node will add some level of redundancy to each data block. This allows the system to recreate the entire block even if some nodes are temporarily unavailable (due to loss of network connectivity, the machine being powered off or a hardware failure). The most storage and bandwidth efficient forms of redundancy use erasure coding techniques like Reed–Solomon. A simple, less CPU intensive but more expensive form of redundancy is duplicate copies. == Flexible contribution == Due to bandwidth or hardware constraints some nodes may not be able to contribute as much space as they consume in the cloud. On the other hand, nodes with large storage space and limited or no bandwidth constraints may contribute more than they consume, thereby the cooperative can stay in balance.
Differential evolution
Differential evolution (DE) is an evolutionary algorithm to optimize a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. Such methods are commonly known as metaheuristics as they make few or no assumptions about the optimized problem and can search very large spaces of candidate solutions. However, metaheuristics such as DE do not guarantee an optimal solution is ever found. DE is used for multidimensional real-valued functions but does not use the gradient of the problem being optimized, which means DE does not require the optimization problem to be differentiable, as is required by classic optimization methods such as gradient descent and quasi-newton methods. DE can therefore also be used on optimization problems that are not even continuous, are noisy, change over time, etc. DE optimizes a problem by maintaining a population of candidate solutions and creating new candidate solutions by combining existing ones according to its simple formulae, and then keeping whichever candidate solution has the best score or fitness on the optimization problem at hand. In this way, the optimization problem is treated as a black box that merely provides a measure of quality given a candidate solution and the gradient is therefore not needed. == History == Storn and Price introduced Differential Evolution in 1995. Books have been published on theoretical and practical aspects of using DE in parallel computing, multiobjective optimization, constrained optimization, and the books also contain surveys of application areas. Surveys on the multi-faceted research aspects of DE can be found in journal articles. == Algorithm == A basic variant of the DE algorithm works by having a population of candidate solutions (called agents). These agents are moved around in the search-space by using simple mathematical formulae to combine the positions of existing agents from the population. If the new position of an agent is an improvement then it is accepted and forms part of the population, otherwise the new position is simply discarded. The process is repeated and by doing so it is hoped, but not guaranteed, that a satisfactory solution will eventually be discovered. Formally, let f : R n → R {\displaystyle f:\mathbb {R} ^{n}\to \mathbb {R} } be the fitness function which must be minimized (note that maximization can be performed by considering the function h := − f {\displaystyle h:=-f} instead). The function takes a candidate solution as argument in the form of a vector of real numbers. It produces a real number as output which indicates the fitness of the given candidate solution. The gradient of f {\displaystyle f} is not known. The goal is to find a solution m {\displaystyle \mathbf {m} } for which f ( m ) ≤ f ( p ) {\displaystyle f(\mathbf {m} )\leq f(\mathbf {p} )} for all p {\displaystyle \mathbf {p} } in the search-space, which means that m {\displaystyle \mathbf {m} } is the global minimum. Let x ∈ R n {\displaystyle \mathbf {x} \in \mathbb {R} ^{n}} designate a candidate solution (agent) in the population. The basic DE algorithm can then be described as follows: Choose the parameters NP ≥ 4 {\displaystyle {\text{NP}}\geq 4} , CR ∈ [ 0 , 1 ] {\displaystyle {\text{CR}}\in [0,1]} , and F ∈ [ 0 , 2 ] {\displaystyle F\in [0,2]} . NP : NP {\displaystyle {\text{NP}}} is the population size, i.e. the number of candidate agents or "parents". CR : The parameter CR ∈ [ 0 , 1 ] {\displaystyle {\text{CR}}\in [0,1]} is called the crossover probability. F : The parameter F ∈ [ 0 , 2 ] {\displaystyle F\in [0,2]} is called the differential weight. Typical settings are N P = 10 n {\displaystyle NP=10n} , C R = 0.9 {\displaystyle CR=0.9} and F = 0.8 {\displaystyle F=0.8} . Optimization performance may be greatly impacted by these choices; see below. Initialize all agents x {\displaystyle \mathbf {x} } with random positions in the search-space. Until a termination criterion is met (e.g. number of iterations performed, or adequate fitness reached), repeat the following: For each agent x {\displaystyle \mathbf {x} } in the population do: Pick three agents a , b {\displaystyle \mathbf {a} ,\mathbf {b} } , and c {\displaystyle \mathbf {c} } from the population at random, they must be distinct from each other as well as from agent x {\displaystyle \mathbf {x} } . ( a {\displaystyle \mathbf {a} } is called the "base" vector.) Pick a random index R ∈ { 1 , … , n } {\displaystyle R\in \{1,\ldots ,n\}} where n {\displaystyle n} is the dimensionality of the problem being optimized. Compute the agent's potentially new position y = [ y 1 , … , y n ] {\displaystyle \mathbf {y} =[y_{1},\ldots ,y_{n}]} as follows: For each i ∈ { 1 , … , n } {\displaystyle i\in \{1,\ldots ,n\}} , pick a uniformly distributed random number r i ∼ U ( 0 , 1 ) {\displaystyle r_{i}\sim U(0,1)} If r i < C R {\displaystyle r_{i}