Esercizi aggregation su MongoDB - Books
file book.txt
Json del documento
{
"_id" : 9,
"title" : "Griffon in Action",
"isbn" : "1935182234",
"pageCount" : 375,
"publishedDate" : ISODate("2012-06-04T07:00:00Z"),
"thumbnailUrl" : "https://s3.amazonaws.com/AKIAJC5RLADLUMVRPFDQ.book-thumb-images/almiray.jpg",
"shortDescription" : "Griffon....",
"longDescription" : "Although...",
"status" : "PUBLISH",
"authors" : [ "Andres Almiray", "Danno Ferrin", "", "James Shingler"],
"categories" : [ "Java"]
}
Query da fare:
- derive the categories of the books written by “Andres Almiray”
- derive the average length of the books on “Java”
- derive the shortest books in each category
- derive the most prolific authors wrt the number of books they have written
- rank the authors wrt the number of pages they have written ○ for a book with n authors and p pages, author contribution is estimated as p/n
- derive the most referred book categories
Group Accumulator Operators
$sum (Returns a sum of numerical values. Ignores non-numeric values.)
$avg (Returns an average of numerical values. Ignores non-numeric values.)
$first (Returns a value from the first document for each group. Order is only defined if the documents are in a defined order. - Available in $group stage only.)
$last (Returns a value from the last document for each group. Order is only defined if the documents are in a defined order. - Available in $group stage only.)
$max (Returns the highest expression value for each group. - Available in both $group and $project stages)
$min (Returns the lowest expression value for each group. - Available in both $group and $project stages)
$push (Returns an array of expression values for each group. - Available in $group stage only)
$addToSet (Returns an array of unique expression values for each group. Order of the array elements is undefined. - Available in $group stage only.)
$stdDevPop (Returns the population standard deviation of the input values. - Available in both $group and $project stages)
$stdDevSamp (Returns the sample standard deviation of the input values. - Available in both $group and $project stages)
{$sort: { field1: sortOrder, field2 : sortOrder, ... } }
{ $cmp: [ expression1, expression2] } -1 0 oppure +1 in base al valori di comparazione
$addFields { $addFields: { newField: expression, ... } }
$size { $size: expression }
$ne{ $ne: [ expression1, expression2 ] } (diverso)
Limit { $limit: positiveInteger }
Varie
$gt (Maggiore - db.inventory.find( { qty: { $gt: 20 } } ) )
1) derive the categories of the books written by “Andres Almiray”
db.books.aggregate([
{ $match: { authors: {$in: ["Andres Almiray","$authors"]}}},
{ $group : { _id : "NULL", categories: {$addToSet: "$categories" }}},
{ $project: { _id:0, autore:"Andres Almiray", categories:1 }}
])
2) derive the average length of the books on “Java”
db.books.aggregate([
{ $match: { categories: {$in: ["Java", "$categories"]}}},
{ $group: { _id:"NULL", avarage: {$avg: "$pageCount"}}},
{ $project: {_id:0, JavaAvarageLenght: "$avarage"}}
]).pretty()
3) derive the shortest books in each category [io]
db.books.aggregate([
{ $unwind: "$categories" },
{ $sort : { pageCount : -1}},
{ $group: {_id: {categories:"$categories", title:"$title" }, npage:{$push: "$pageCount"}}},
{ $unwind: "$npage" },
{ $project: {_id:"NULL", categories:"$_id.categories", title:"$_id.title",npage:"$npage"}},
{ $sort : { npage : -1}},
{ $group: {_id:"$categories", book: {$last:"$title"}, npage: {$last: "$npage"}}},
{ $project: {_id:0, categories:"$_id", title:"$book",npage:"$npage"}}
]).pretty()
3) derive the shortest books in each category [Giusto - Proff]
db.books.aggregate([
{$unwind: "$categories"},
{$sort: {"pageCount":1}},
{$group: {"_id":{"category":"$categories"}, "books":{$addToSet: {"title":"$title","pages":"$pageCount"}}, "shortest":{$first:"$pageCount"}}},
{$unwind: "$books"},
{$addFields: {"toBeSelected": {$cmp: ["$books.pages","$shortest"]}}},
{$match: {"toBeSelected":0}},
{$group: {"_id":{"category":"$_id.category"}, "books":{$addToSet: {"title":"$books.title","pages":"$books.pages"}}, "cfr":{$first:"$shortest"}}}
])
4) derive the most prolific authors wrt the number of books they have written
db.books.aggregate([
{ $unwind: "$authors" },
{ $group:{ _id : "$authors", nbook:{$sum: 1} }},
{ $sort: { nbook:-1 }}
]).pretty()
5) rank the authors wrt the number of pages they have written (for a book with n authors and p pages, author contribution is estimated as p/n )
db.books.aggregate([
{ $addFields: { "nauthors" : { $size: "$authors" }}},
{ $match : { nauthors : {$gt: 0}}},
{ $addFields: { "authorsPage": {$divide: ["$pageCount", "$nauthors"]}}},
{ $unwind:"$authors" },
{ $group:{ _id:"$authors", npage: {$sum: "$authorsPage"}}},
{ $sort: { npage:-1 }}
{ $project: {_id:0, autor:"$_id", pagewritten:"$npage"}}
]).pretty()
6) derive the most referred book categories
db.books.aggregate([
{ $unwind : "$categories"},
{ $group: { _id:"$categories", t: {$sum: 1}}},
{ $sort: {t:-1}},
{ $limit: 1 },
{ $project: {_id:0, Category:"$_id", numberBook:"$t"}}
])